logoalt Hacker News

WalterBrightlast Friday at 3:55 PM5 repliesview on HN

Um, it is necessary to compile a program before being able to interpret it. I don't know how early BASICs were implemented, but the usual method is to compile it to some sort of intermediate representation, and then interpret that representation.

D's compile time function execution engine works that way. So does the Javascript compiler/interpreter engine I wrote years ago, and the Java compiler I wrote eons ago.

The purpose to going all the way to generating machine code is the result often runs 10x faster.


Replies

wvenablelast Friday at 7:58 PM

Early BASICs didn't compile a program before interpreting it. The interpreter read the code as written and executed it step-by-step. There was some tokenization; keywords were turned into single or double bytes and that was literally done when you pressed enter on the keyboard. Your source code was these actual tokenized bytes. On the Commodore 64, you could type the tokenized versions of keywords instead of the full keyword as a shortcut. Even numbers were not transformed into bytes ahead of time.

This was used to save memory -- there wasn't much room to hold both the source code and an intermediate form. But also it wasn't that necessary, with the keywords tokenized and the syntax so simple that there wouldn't have been much savings in space or performance.

tasty_freezelast Friday at 6:57 PM

You have an idiosyncratic definition of "compiler" then. Many BASICs, including the MS family of BASICs, did tokenize keywords to save on memory storage.

But 99.9% of people take "compiler" to mean translating source code to either a native CPU instruction set or a VM instruction set. In any tutorial on compilers, tokenization is only one aspect of compilation, as you know very well. And unlike some of the tricky tokenization aspects that crop up in languages like C++, BASIC interpreters simply had a table of keywords with the MSB set to indicate boundaries between keywords. The tokenizer simply did greedy "first token which matches the next few characters" is the winner, and encoded the Nth entry from that table as token (0x80 + N).

When LIST'ing a program, the same table was used: if the byte was >= 0x80, then the first N-1 keywords in the table were skipped over and the next one was printed out.

There were also BASIC implementations that did not tokenize anything; every byte was simply interpreted on every execution of the line. There were tiny BASICs where instead of using the full keyword "PR" meant "PRINT", and "GO" meant "GOTO" etc.

stevekemplast Friday at 5:20 PM

It is not necessary to compile a program, in the general case, before executing it.

Many programming languages parse their program to an AST then walk that AST interpretting as they go. But for BASIC you can parse/execute statement by statement - no need to parse the whole program ahead of time, and certainly zero need to compile to either machine code or any internal representation.

Remember at the time we're talking about 64k was a lot of RAM. Some machines had less.

show 1 reply
eichinlast Friday at 5:19 PM

> necessary to compile

Um, no? your experience is probably at least two decades after the time period in question.. The more advanced versions of, for example, the TRS-80 BASIC (part of this "microcomputer BASICs that all share a common set of bugs") did no more than tokenize - so, `10 PRINT "Hello"` would have a binary representation for the line number, a single byte token for PRINT, then " H E L L O " and an end-of-line marker. Actually interpreting the code involved just reading it linearly; GOTO linenumber involved scanning the entire code in memory for that line number (and yes, people really did optimize things by putting GOTO and GOSUB targets earlier in the program so the interpreter would find them faster :-)

show 2 replies