logoalt Hacker News

johndoughtoday at 10:52 AM0 repliesview on HN

> restricting the search space to syntactically valid programs (how do you even randomly generate that?)

By using a grammar. Here is an example on how to only generate valid JSON with llama.cpp: https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...

> A trillion is 8 symbols. You still haven't reached the end of your first import statement.

Since LLMs use tokens from a vocabulary instead of characters, the number is likely somewhere in the lower billions for the first import statement.

But of course, LLMs do not sample from a uniform random distribution, so there are even fewer likely possibilities.