WZ Exporting wz into json (and other explorations)

geospiza · Nov 7, 2020

I wrote a

You must be registered to see links

because the classic XML format is tedious to program against. I'm going to use this to build a smaller client. The v62 client is 1GB in size, but it could be much smaller after removing unused conent.

Code:

423mb   Map.wz
255mb   Sound.wz
196mb   Mob.wz
136mb   Character.wz
45mb    Skill.wz
29mb    Npc.wz
27mb    Reactor.wz
12mb    UI.wz
12mb    Item.wz
7mb     Effect.wz
3mb     Morph.wz
2mb     Quest.wz
2mb     String.wz
0mb     Etc.wz
0mb     List.wz
0mb     Base.wz
0mb     TamingMob.wz

I started off looking into Harepacker resurrected. However, the UI is limited for my needs, both for exploration and for bulk-deletes. Harepacker fortunately provides an option to dump the files a text format like XML. I expected this to be straightforward to parse, but quickly became dismayed after exploring the schema.

Code:

root
 |-- _name: string (nullable = true)
 |-- imgdir: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- _name: string (nullable = true)
 |    |    |-- _value_tag: string (nullable = true)
 |    |    |-- canvas: struct (nullable = true)
 |    |    |    |-- _basedata: string (nullable = true)
 |    |    |    |-- _height: long (nullable = true)
 |    |    |    |-- _name: string (nullable = true)
 |    |    |    |-- _value_tag: string (nullable = true)
 |    |    |    |-- _width: long (nullable = true)
 |    |    |-- float: struct (nullable = true)
 |    |    |    |-- _name: string (nullable = true)
 |    |    |    |-- _value: double (nullable = true)
 |    |    |    |-- _value_tag: string (nullable = true)
 |    |    |-- imgdir: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
...

Example schema of Map.wz/Map/*/* via PySpark
You must be registered to see links

Document elements encode the type of the value and the attributes contain the values of that node. If you wanted filter all maps that contained a particular bgm ID, you would look in the xml file. Then you have to query the data for a string with the name bgm, inside of the imgdir with name info, inside of the imgdir of the map's id.

Code:

<imgdir name="info">
<int name="version" value="10"/>
<int name="cloud" value="0"/>
<int name="town" value="1"/>
<float name="mobRate" value="1.0"/>
<string name="bgm" value="Bgm00/FloralLife"/>
<int name="returnMap" value="100000000"/>
<int name="forcedReturn" value="999999999"/>
<int name="hideMinimap" value="0"/>
<int name="moveLimit" value="0"/>
<string name="mapMark" value="Henesys"/>
</imgdir>

This format is too flexible for its own good. While data is lossless in the classic XML format, it's

You must be registered to see links

. The folks who wrote MCDB had the idea right, but I haven't seen the source to compile wz's or a copy of the v62 client. I forked MapleLib and wrote a JSON serializer that generates data that can be queried like this:

Code:

>>> import json
>>> from pathlib import Path
>>> d = json.loads(Path("Map.wz/Map/Map0/000000000.img.json").read_text())
>>> d["payload"]["info"]["bgm"]
'BgmJp/FirstStepMaster'

It's much easier to work with relationships between all of the IDs in the JSON when values are elements of the tree instead of an attribute of a node. The downside is that this can't be transformed back into the original binary format because the types are implicit (e.g. should 1 be a float, an int, a short, or a double?). I'd probably go with avro if I wanted to do anything more complex outside of MapleLib that requires repacking.

One nice feature the JSON data is that things that look like arrays are treated like arrays. Here's a look at the portals section:

Code:

 |    |-- portal: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- index: long (nullable = true)
 |    |    |    |-- item: struct (nullable = true)
 |    |    |    |    |-- delay: long (nullable = true)
 |    |    |    |    |-- hideTooltip: long (nullable = true)
 |    |    |    |    |-- image: string (nullable = true)
 |    |    |    |    |-- onlyOnce: long (nullable = true)
 |    |    |    |    |-- pn: string (nullable = true)
 |    |    |    |    |-- pt: long (nullable = true)
 |    |    |    |    |-- script: string (nullable = true)
 |    |    |    |    |-- tm: long (nullable = true)
 |    |    |    |    |-- tn: string (nullable = true)
 |    |    |    |    |-- x: long (nullable = true)
 |    |    |    |    |-- y: long (nullable = true)

This means you can do something along these lines in SQL:

Code:

SELECT 
  name, item.x, item.y
FROM maps,
UNNEST(payload.portal) item

I moved my attention to looking at the

You must be registered to see links

.

geospiza - Exporting wz into json (and other explorations) - RaGEZONE Forums

This was built using all of the map id, the return map id, and the portal map id ("pn <> 'sp' and tm <> 999999999") and plotting it in Gephi. I made a smaller one by filtering out all of the maps in Victoria.

So with this, I have a list of all the map IDs I want to keep.

There are few things that I have on my TODO list. The first is to omit directories during repacking given a list of IDs. The next is to list connected maps/mobs/sounds given a set of maps (say maps in Ludibrium). Finally, I need to test this with a real client and server emulator, because I'm not familiar enough to know what will happen if there are missing IDs.

BangFlade · Nov 7, 2020

Nice.

I think someone already came up with a JSON serialization of the wz files already, and for v62.

You must be registered to see links

He uses them for his web browser implementation of the game, kinda unfortunate he stopped pushing commits to the repo. I had high hopes to see his v62 game come to success.

geospiza · Nov 7, 2020

Thanks for the link! A web browser implementation is ambitious, bummer to hear that development stopped.

Taking a look at some of the data in that repo:

Code:

{"$imgdir":"000000000.img","$$":[{"$imgdir":"info","$$":[{"$int":"version","value":"10"}...

It looks like direct analogue to the XML serialization, so the same query problems apply here. Extracting data goes along these lines:

Code:

#!/usr/bin/env python3

# version field
data = json.loads(...)
payload = x["$$"]
info = [x["$$"] for x in payload if x.get("$imgdir") and x["$imgdir"] == "info"][0]
version = [int(x["value"]) for x in info if x.get("$int") == "version"][0]

The format in the WzJsonSerializer class generates something that can be used like this:

Code:

#!/usr/bin/env python3

# version field
data = json.loads(...)
version = data["payload"]["info"]["version"]

The ergonomics matter, imho.

Welcome!

WZ Exporting wz into json (and other explorations)

geospiza

Newbie Spellweaver

BangFlade

Newbie Spellweaver

geospiza

Newbie Spellweaver

About Us

Online statistics

Forum statistics

RaGEZONE Sponsor