logoalt Hacker News

tasty_freezelast Friday at 6:57 PM0 repliesview on HN

You have an idiosyncratic definition of "compiler" then. Many BASICs, including the MS family of BASICs, did tokenize keywords to save on memory storage.

But 99.9% of people take "compiler" to mean translating source code to either a native CPU instruction set or a VM instruction set. In any tutorial on compilers, tokenization is only one aspect of compilation, as you know very well. And unlike some of the tricky tokenization aspects that crop up in languages like C++, BASIC interpreters simply had a table of keywords with the MSB set to indicate boundaries between keywords. The tokenizer simply did greedy "first token which matches the next few characters" is the winner, and encoded the Nth entry from that table as token (0x80 + N).

When LIST'ing a program, the same table was used: if the byte was >= 0x80, then the first N-1 keywords in the table were skipped over and the next one was printed out.

There were also BASIC implementations that did not tokenize anything; every byte was simply interpreted on every execution of the line. There were tiny BASICs where instead of using the full keyword "PR" meant "PRINT", and "GO" meant "GOTO" etc.