How does this differ from direct threading interpreters? It seems like it solves the same problem ...

riffraff • 02/20/2025 • 3 replies • view on HN

How does this differ from direct threading interpreters?

It seems like it solves the same problem (saving the function call overhead) and has the same downsides (requires non-standard compiler extensions)

EDIT: it seems the answer is that compilers do not play well with direct-threaded interpreters and they are able to perform more/better optimizations when looking at normal-sized functions rather than massive blocks

http://lua-users.org/lists/lua-l/2011-02/msg00742.html

Replies

noelwelsh • 02/20/2025

Unfortunately, most discussion of direct threaded interpreters confuses the implementation techniques (e.g. computed gotos) with the concepts (tail calls, or duality between calls and returns and data and codata, depending on your point of view). What is presented here is conceptually a direct threaded interpreter. It's just implemented in a way that is more amenable to optimization by the compiler technology in use.

(More here: https://noelwelsh.com/posts/understanding-vm-dispatch/)

haberman • 02/20/2025

This is a great summary. When Mike wrote the message you linked, his conclusion was that you have to drop to assembly to get reasonable code for VM interpreters. Later we developed the "musttail" technique which was able to match his assembly language sequences using C. This makes C a viable option for VM interpreters, even if you want best performance, as long as your compiler supports musttail.

> they are able to perform more/better optimizations when looking at normal-sized functions rather than massive blocks

It's not the size of the function that is the primary problem, it is the fully connected control flow that gums everything up. The register allocator is trying to dynamically allocate registers through each opcode's implementation, but it also has to connect the end of every opcode with the beginning of every opcode, from a register allocation perspective.

The compiler doesn't understand that every opcode has basically the same set of "hot" variables, which means we benefit from keeping those hot variables in a fixed set of registers basically all of the time.

With tail calls, we can communicate a fixed register allocation to the compiler through the use of function arguments, which are always passed in registers. When we pass this hot data in function arguments, we force the compiler to respect this fixed register allocation, at least at the beginning and the end of each opcode. Given that constraint, the compiler will usually do a pretty good job of maintaining that register allocation through the entire function.

➕ show 2 replies

coldtea • 02/20/2025

>and has the same downsides (requires non-standard compiler extensions)

It's not a downside if:

(a) you have those non-standard compiler extensions in the platforms you target

alt Hacker News

Replies