Cute, but Rice's theorem remains, and while they translated every byte as code, still no handling is possible for
char buf[] = {0xB8, 0x2A, 0x00, 0x00, 0x00, 0xC3};
return ((int (*)(void))buf)();
static translation is only possible when you assume no adversarial code AND mostly assume compiler-produced binaries. hand-rolled asm gets hard, and adversarial code is provably unsolvable in all cases.still, pretty cool for cooperative binaries
But in fact no modern processor/OS executes this either. Pages are marked as executable or not, and static data is loaded as non-executable pages.
I read those bytes and immediately thought "mov eax, 42; ret".
It looks like their system would just generate return 42;
No based on the abstract it can handle that code. What it can't handle is runtime code generation.
I only read the abstract but I got the impression that their solution to this is they have both. They translate all the data as if it was code and if it gets called into they use the translation where if it gets read as memory they use the original.
Edit I found this in the paper
> Elevator sidesteps the code-versus-data determination altogether through an application of superset disassembly [6]: we simultaneously interpret every executable byte offset in the original binary as (i) data and (ii) the start of a potential instruction sequence beginning at that offset, and we build the superset control flow graph from every one of the resulting candidate decodes. Every potential target of indirect jumps, callbacks, or other runtime dispatch mechanisms that cannot be statically analyzed therefore has a corresponding landing point in the rewritten binary. These targets are resolved at runtime through a lookup table from original instruction addresses to translated code addresses that we embed in the final binary.