The Legend of Mata Nui
The Legend of Mata Nui

2018-2020 c# c++ lego modding decompiler language design reverse engineering

I wrote a suite of modding tools, including a script compiler and decompiler, a model converter, and a C++ mod system for the abandoned game ‘LEGO Bionicle: The Legend of Mata Nui’.

Background

LEGO Bionicle: The Legend of Mata Nui was a work-in-progress computer game developed by Saffire revolving around the Bionicle line of LEGO products. Unfortunately, development was cancelled in October 2001.

The first area (known as 'bech') in the The Legend of Mata Nui

When one of the developers distributed the last build of the game online, I joined a team of modders in a development capacity, where the first task was getting the game to run on modern Windows systems. JrMasterModelBuilder and I got the game running on Windows 10 by crafting a couple of patches for the EXE, and I learned about reverse-engineering with IDA in the process.

JMMB also reverse-engineered the game’s script bytecode format (among pretty much everything else), and built an assembler and disassembler. Working at the bytecode assembly level was still slow and cumbersome, so I designed a higher-level language around the bytecode features, called LSS (Litestone Script), and wrote a compiler and decompiler for the language targeting the bytecode.

Once JMMB developed a tool for unpacking and repacking the game’s data archive format (.blk ‘Blockfiles’), we saw that the game’s textures were in the ordinary first-generation DDS file format and thus easily accessible, but the models were another story altogether.

LOMN’s models were in the DirectX SDK model format (.x), but we couldn’t find any straightforward free/open source way to get the files into a modern 3D modeling package, so I wrote LOMNTool in C# to convert the LOMN .x files to Collada and Wavefront OBJ files. This was a smart choice, as we eventually figured out that the skinned models were laid out different from the normal DirectX SDK format (all the bones are separated out into a .bhd file), meaning that existing tools didn’t work anyway.

Modifying the game’s data files (particularly the script) allowed the team lots of flexibility in what we wanted to change, but we still had some limitations imposed by the native code in the EXE. Because the build of the game we were working was an internal debug build, it still had all its symbols, meaning the class, function, and global variable names were all still present in the EXE. This made reverse-engineering the executable with IDA relatively straightforward, and once I had an idea of how the executable worked, I wrote a DLL hijack that serves as a mod loader, loading other DLLs and providing common functionality.

Litestone Script

Litestone Script is the language that I designed as a higher-level way to work with the script bytecode that LOMN consumes. We know that the bytecode itself was compiled from a higher level language, but no details beyond the extension (.vbo) are known. So, I tried my hardest to design a language as closely following the features of the game’s bytecode as closely as possible, but also felt similar to existing C-style languages (curly braces and semicolons, etc).

This feature parity with the engine’s virtual machine led to some ugly syntactical workarounds for bytecode concepts that I couldn’t figure out how to represent in a nicer way. Some examples:

  • Color values being immutable threw off the way LSS color values are created and modified, and really just all the builtins
  • Not being able to initialize global variables or class properties at the point of declaration

I hadn’t taken any compiler courses at the time, so I wrote the whole LSS compiler while following along with Bob Nystrom’s excellent (and free!) book Crafting Interpreters. I cannot overstate how entertaining, informative, and insightful it is - if you can read even basic object-oriented Java, give it a read!

A byproduct of this phase of development was the ‘LSS Interactive’ window in Shoefitter-DX - I needed a way to type LSS, hit Ctrl+Enter, and inspect the compiled bytecode. This let me quickly verify that the compiler was producing the right bytecode in as many situations and corner cases as I could think of.

Having a compiler makes it straightforward to write more complicated bytecode sequences for the game, but we need a decompiler to change the existing script at a high level and to store the script in textual, version-control friendly ‘source code’.

The decompiler generates LSS from subroutines by splitting the bytecode at the jumps and jump destinations, and then analysing the resulting graph of chunks and jumps. It is important to note that the decompiler does not handle decompiling every possible valid bytecode sequence; it merely detects the bytecode patterns that the original script compiler generated.

Generated control-flow diagram next to its decompiled LSS

Beyond that, it also doesn’t generate exactly equivalent LSS. For example, a bytecode for-each loop only determines the array length once before the loop, but this decompiles into a while-loop with the array length being checked every iteration. While this is a semantic difference and could potentially change the meaning of certain loops where the array length changes in the loop, it has not been an issue in practice. (I hope I don’t have to eat my words later!) Properly detecting the loop as a for-each loop and extracting the relevant expressions would require lookahead or lookbehind and perhaps some recursion like the parser, and I wasn’t ready to rework the decompiler that much when this pretty much works as-is.

You can use the compiler and decompiler through the lssc CLI downloadable from its releases page.

LOMNTool

LOMNTool is a command-line program I wrote to do batch conversions between various file formats used in the game. It is prototype-quality code (if that) so things look pretty ugly.

It includes a full parser and writer for the binary version of the obnoxiously open-ended DirectX SDK .x file format, which I implemented before reading Crafting Interpreters (see above), meaning it took way too long to implement and probably isn’t using the best approach to things, although it looks functional to me. However, LOMNTool’s conversions of .x files is strictly targeted towards LOMN - if you would like to do conversions for another game, I suggest copying XFiles.cs (the generic .x file format) and then adapting XUtils.cs or writing your own code to actually make use of the parsed objects and data.

Also, all the Collada support in LOMNTool (ColladaUtils.cs) was originally written against the Collada 1.4 spec, and then when I was hit by the harsh reality that is the wildly varying Collada support in 3D DCC tools, I bashed the Collada code with a sledgehammer until it worked with Autodesk Maya. Support for your 3D package will vary.

You can find LOMNTool binaries on its releases page.

Shoefitter-DX

Gali's idle animation in Shoefitter-DX

Using the reverse-engineering knowledge and code prototyped in LOMNTool, I am working on a game editor for the LOMN game files. It is called Shoefitter-DX, named after the original LOMN editing tools, which were known as Shoefitter.

I started Shoefitter-DX using Windows Forms and wrote the LSS Interactive and OSI Browser windows and started on an animation window, before deciding to rewrite the whole thing targeting WPF. My first goal for the new Shoefitter-DX has been a full character editor, and so far I have AI & collision editing and model & animation rendering complete. I hope to eventually implement a level editor and cinematic editor, though for script editing, replacing Visual Studio might not be worth the time.

LOMNHook

We wanted to make more significant changes to the native game executable in multiple ways, so I took inspiration from Minecraft Forge and created a mod loader by building a fake version of the Intel JPEG Library that was shipped with the game. By impersonating ijl15.dll, LOMNHook is loaded into the process before the game’s native code runs.

However, that’s just the first part of the process. Even after the LOMNHook DLL is loaded into the process, it needs to execute code, or it’s worthless. By writing our code in DllMain, it will run when it is loaded into the game, which is a good start. Unfortunately, DllMain is called with some restrictions, and we’d like a chance to run our custom code after the game has begun executing. We don’t want the game to just immediately overwrite what we change, of course.

Luckily, the game engine LOMN is built on, SAGE, inadvertently provides us with the perfect opportunity - it declares a global array-list of function pointers, called ScGlobalInitFuncs, to be invoked while loading, and this list is lazily initialized, meaning it won’t overwrite the list if it already exists! All we need to do is allocate some static memory (because the new operator isn’t working in DllMain) to point ScGlobalInitFuncs at and add a custom function as the first entry in the list, and boom - we can run whatever code we want at the beginning of the game’s initialization! And, because our callback is invoked after all the other initialization functions have been queued up in the list, we can use it to add another one to the end of the queue if we’d like to do some initialization after the game’s built-in intialization callbacks.

LOMNHook doesn’t hoard this power for itself - it loads all the .dll files in the mods/ directory and loads them if they adhere to the specified interface. This provides an extensible foundation for an ecosystem of DLL mods that play nice with each other (except for one thing - they all have to use the correct build settings, such as linking to the same versions of runtimes and whatnot).

LOMNHook and some DLL mods are distributed as part of the game packages you can find at the BioMedia Project.