logoalt Hacker News

riffrafflast Thursday at 6:44 AM3 repliesview on HN

How does this differ from direct threading interpreters?

It seems like it solves the same problem (saving the function call overhead) and has the same downsides (requires non-standard compiler extensions)

EDIT: it seems the answer is that compilers do not play well with direct-threaded interpreters and they are able to perform more/better optimizations when looking at normal-sized functions rather than massive blocks

http://lua-users.org/lists/lua-l/2011-02/msg00742.html


Replies

habermanlast Thursday at 4:40 PM

This is a great summary. When Mike wrote the message you linked, his conclusion was that you have to drop to assembly to get reasonable code for VM interpreters. Later we developed the "musttail" technique which was able to match his assembly language sequences using C. This makes C a viable option for VM interpreters, even if you want best performance, as long as your compiler supports musttail.

> they are able to perform more/better optimizations when looking at normal-sized functions rather than massive blocks

It's not the size of the function that is the primary problem, it is the fully connected control flow that gums everything up. The register allocator is trying to dynamically allocate registers through each opcode's implementation, but it also has to connect the end of every opcode with the beginning of every opcode, from a register allocation perspective.

The compiler doesn't understand that every opcode has basically the same set of "hot" variables, which means we benefit from keeping those hot variables in a fixed set of registers basically all of the time.

With tail calls, we can communicate a fixed register allocation to the compiler through the use of function arguments, which are always passed in registers. When we pass this hot data in function arguments, we force the compiler to respect this fixed register allocation, at least at the beginning and the end of each opcode. Given that constraint, the compiler will usually do a pretty good job of maintaining that register allocation through the entire function.

show 2 replies
noelwelshlast Thursday at 8:44 AM

Unfortunately, most discussion of direct threaded interpreters confuses the implementation techniques (e.g. computed gotos) with the concepts (tail calls, or duality between calls and returns and data and codata, depending on your point of view). What is presented here is conceptually a direct threaded interpreter. It's just implemented in a way that is more amenable to optimization by the compiler technology in use.

(More here: https://noelwelsh.com/posts/understanding-vm-dispatch/)

coldtealast Thursday at 9:08 AM

>and has the same downsides (requires non-standard compiler extensions)

It's not a downside if:

(a) you have those non-standard compiler extensions in the platforms you target

(c) for the rest, you can ifdef an alternative that doesn't require them