logoalt Hacker News

Theseus, a Static Windows Emulator

58 pointsby zdwyesterday at 4:14 AM6 commentsview on HN

Comments

jcranmertoday at 9:51 PM

So I've definitely played around with a lot of these ideas, because the thing that I'm trying to emulate/port/reverse engineer is a 16-bit Windows application, which coincidentally means that a lot of the existing infrastructure just keels over and dies in supporting it.

I originally started the emulation route, but gave that up when I realized just how annoying it was to emulate the libc startup routine that invoked WinMain, and from poking around at the disassembly, I really didn't need to actually support anything before WinMain. And then poking further, I realized that if I patched the calls to libc, I only had to emulate a call to malloc at the, well, malloc level, as opposed to implementing the DOS interrupt that's equivalent to sbrk. (And a side advantage to working with Win16 code is that, since every inter-segment call requires a relocation, you can very easily find all the calls to all libc methods, even if the disassembler is still struggling with all of the jump tables.)

After realizing that the syscalls to modify the LDT still exist and work on Linux (they don't on Windows), I switched from trying to write my own 386 emulator to actually just natively executing all the code in 16-bit mode in 64-bit mode and relying on the existing relocation methods to insert the thunking between 16-bit and 64-bit code [1]. The side advantage was that this also made it very easy to just call individual methods given their address rather than emulating from the beginning of main, which also lets me inspect what those methods are doing a lot better.

Decompilation I've tried to throw in the mix a couple of times, but... 16-bit code really just screws with everything. Modern compilers don't have the ontology to really support it. Doing my own decompilation to LLVM IR runs into the issue that LLVM isn't good at optimizing all of the implemented-via-bitslicing logic, and I still have to write all of the patterns manually just to simplify the code for "x[i]" where x is a huge pointer. And because it's hard to reinsert recompiled decompiled code into the binary (as tools don't speak Win16 file formats anymore), it's difficult to verify that any decompilation is correct.

[1] The thunking works by running 16-bit code in one thread and 64-bit code in another thread and using hand-rolled spinlocks to wait for responses to calls. It's a lot easier than trying to make sure that all the registers and the stack are in the appropriate state.

show 1 reply
nxobjecttoday at 9:17 PM

A follow-up question I'd like to ask to the author: if I understand correctly, one of the ideas is that you can write "high-level", almost platform-independent translations like:

    regs.eax = 3;
    regs.eax = add(regs.eax, 4);
    windows_api();  // some native implementation of the API that was called
and rely on optimizers to do as much "lowering" or specialization to a platform as possible. I'd like to ask: how reliable have optimizers been in gettting those results (e.g. how much fiddling with various -Oxxx flags)?
show 1 reply
mmastractoday at 8:00 PM

I think "recompiler" is the accepted term in the art, is it not?

mysterydiptoday at 7:16 PM

That’s a really clever idea/solution I hadn’t thought of before, and yet it makes “of course, duh” sense as well.

pema99today at 8:29 PM

See also the authors previous project retrowin32 https://github.com/evmar/retrowin32