parsing age of empires game record files(.mgx)
Asked Answered
H

2

12

I am a fan of the outmoded game Age of Empires II(AoE). I want to write a parser of AoE game record(.mgx files) using Python.

I did some searching on GitHub and found little projects on this, the most useful one is aoc-mgx-format which provide some details of .mgx game record files.

Here is the problem:

according to the reference, structure of a .mgx file is like:

| header_len(4byte int) | next_pos(4byte int) | header_data | ... ... |

The hex data's byte order in mgx format is little endian.

header_len stores data length of the Header part(header_len + next_post + header_data)

header_data stores useful imformation i need, but its compressed with zlib

I tried to decompress data in header_data with zlib module like this:

import struct
import zlib

with open('test.mgx', "rb") as fp:
    # read the header_len bytes and covert it to a int reprents length of Header part
    header_len = struct.unpack("<i", fp.read(4))[0]

    # read next_pos (this is not important for me)
    next_pos = struct.unpack("<i", fp.read(4))[0]

    # then I can get data length of header_data part(compressed with zlib)
    header_data_len = header_len - 8

    compressed_data = fp.read(header_data_len)[::-1] # need to be reversed because byte order is little endian?

    try:
        zlib.decompress(compressed_data)
        print "can be decompressed!"
    except zlib.error as e:
        print e.message

but I got this after running the program:

Error -3 while decompressing data: incorrect header check

PS: Sample .mgx files can be found here: https://github.com/stefan-kolb/aoc-mgx-format/tree/master/parser/recs

Hoo answered 17/4, 2015 at 5:7 Comment(10)
Data don't need to be reversed because the byte-order is little-endian. You already converted them from little-endian to native by using "<i" instead of just "i" in your unpack calls. (And besides, I'll bet your computer is natively little-endian anyway.)Bottomless
There is a typo in your question, where you say "outmoded game Age of Empires", I think you mean "wonderfully awesome game Age of Empires".Bedder
Anyway, when you fix that problem (by removing the [::-1], that fixes that error, and instead gives you the correct error -3, complaining that EC BD doesn't look like a valid compression method. Since you're usually going to see 79 9C or 79 DA at the start of a valid zlib compressed blob, it may be worth scanning the file for those bytes…Bottomless
@Bottomless thx. i used struct.unpack() only on the first 8 bytes. For header_data, I think it needs to be reversed before zlib.decompress(). I tried not reversing it, but still the same problem.Hoo
Why do you think it needs to be reversed? That would be very unusual (and the older the format, the more unusual, because it would be inefficient…), and the reverse-engineered-spec you linked to just says "need to uncompress. (zlib deflate compress)", nothing about reversing it.Bottomless
Hold on, maybe it's zlib without a zlib header (as in gzip). Let me try something.Bottomless
@Bottomless you are great!!! i googled with "zlib without a zlib" and found some useful! zlib.decompress(compressed_data, -zlib.MAX_WBITS) will workHoo
@lichifeng: Ah, I thought you could only suppress the header by passing -wbits to a decompressor object, not to the decompress method too. That's even simpler. :)Bottomless
@Bottomless certainly i will use decompressor object in real project, its just test code above. thanks again, i guess you have played this game, too XDDDHoo
I think that last comment was for @LegoStormtroopr, not me. :) I have played it, but not for a long time. I like Europa Universalis and Crusader Kings for my strategy fix, so my questions are about writing an iterative parser for human-readable-text-but-300MB files. :)Bottomless
B
5

Your first problem is that you shouldn't be reversing the data; just get rid of the [::-1].

But if you do that, instead of getting that error -3, you get a different error -3, usually about an unknown compression method.

The problem is that this is headerless zlib data, much like what gzip uses. In theory, this means the information about the compression method, window, start dict, etc. has to be supplied somewhere else in the file (in gzip's case, by information in the gzip header). But in practice, everyone uses deflate with the max window size and no start dict, so if I were designing a compact format for a game back in the days when every byte counted, I'd just hardcode them. (In modern times, exactly that has been standardized in an RFC as "DEFLATE Compressed Data Format", but most 90s PC games weren't following RFCs by design...)

So:

>>> uncompressed_data = zlib.decompress(compressed_data, -zlib.MAX_WBITS)
>>> uncompressed_data[:8] # version
b'VER 9.8\x00'
>>> uncompressed_data[8:12] # unknown_const
b'\xf6(<A'

So, it not only decompressed, that looks like a version and… well, I guess anything looks like an unknown constant, but it's the same unknown constant in the spec, so I think we're good.

As the decompress docs explain, MAX_WBITS is the default/most common window size (and the only size used by what's usually called "zlib deflate" as opposed to "zlib"), and passing a negative value means that the header is suppressed; the other arguments we can leave to defaults.

See also this answer, the Advanced Functions section in the zlib docs, and RFC 1951. (Thanks to the OP for finding the links.)

Bottomless answered 17/4, 2015 at 5:33 Comment(2)
thx a lot, i found this with some keywords you provided, also useful https://mcmap.net/q/180041/-how-can-i-decompress-a-gzip-stream-with-zlibHoo
@lichifeng: I added the links to the answer. Nice find.Bottomless
S
3

Old but here is a sample of what I did :

class GameRecordParser:

def __init__(self, filename):
    self.filename = filename
    f = open(filename, 'rb')

    # Get header size
    header_size = struct.unpack('<I', f.read(4))[0]
    sub = struct.unpack('<I', f.read(4))[0]
    if sub != 0 and sub < os.stat(filename).st_size:
        f.seek(4)
        self.header_start = 4
    else:
        self.header_start = 8

    # Get and decompress header
    header = f.read(header_size - self.header_start)
    self.header_data = zlib.decompress(header, -zlib.MAX_WBITS)

    # Get body
    self.body = f.read()
    f.close()

    # Get players data
    sep = b'\x04\x00\x00\x00Gaia'
    pos = self.header_data.find(sep) + len(sep)
    players = []
    for k in range(0, 8):
        id = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        type = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        name_size = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        name = self.header_data[pos:pos+name_size].decode('utf-8')
        pos += name_size
        if id < 9:
            players.append(Player(id, type, name))

Hope it helps future programmer :)

By the wway I am planning on writting such a library.

Sivas answered 15/3, 2017 at 20:12 Comment(2)
Did you write that library?Carlynne
@Carlynne it is still in progress but I have not much more time these days (github.com/voblivion/AoE2RecordsParser). Feel free to contribute ;) I'll check any pull request/suggestion.Sivas

© 2022 - 2024 — McMap. All rights reserved.