logoalt Hacker News

fragmedeyesterday at 8:48 AM2 repliesview on HN

Say you have this new language, with only a tiny amount of examples of there. How do the SOTA labs train on you're language? With sufficient examples, it can generate code which gets compiled and then run and that gets fed into a feedback loop to improve upon, but how do you get there? How do you bootstrap that? Nevermind the dollar cost, how does it offer something above having an LLM generate code in python or JavaScript, then having it rewrite it in golang/rust/c++ as needed/possible for performance or whatever reason?

It sounds like your plan is for it to write fewer bugs in NewLang, but, well, that seems a bit hard to achieve in the abstract. From bugs I've fixed in generated code, early LLM, it was just bad code. Multiple variables for the same thing, especially. Recently they've gotten better at that, but it still happens.

For a concrete example, any app dealing with points in time. Which sometimes have a date attached but sometimes do not. And also, what are timezones. The complexity is there because it depends on what you're trying to do. An alarm clock is different than a calendar is different than a pomodoro timer. How are you going to reduce the bugged-ed-ness of that without making one of those use cases more complicated than need be, given access to various primitives.


Replies

keepamovinyesterday at 9:27 AM

Your hypothetical misses praxis: in my experience LLM can pick up any new syntax with ease. From a few examples, it can generate more. With a compiler (even partial on limited syntax), it can correct. It soon becomes fluent simply from the context of your codebase. You don't need to "train" an LLM to recognize language syntax. It's effortless for it to pick it up.

Or, maybe my lanng just had LLM-easy syntax - which would be good - but I think this is more just par for the course for LLMs, bud.

show 1 reply
ModernMechyesterday at 2:45 PM

> Say you have this new language, with only a tiny amount of examples of there. How do the SOTA labs train on you're language?

Languages don't exist in isolation, they exist on a continuum. Your brand new language isn't brand new, it's built off the semantics and syntax of many languages that have come before it. Most language designers operate under what is known as a "weirdness budget", which is about keeping your language to within some delta of other languages modulo a small number of new concepts. This is to maintain comprehensibility, otherwise you get projects like Hoon / Nock where true is false and up is down that no one can figure out.

Under a small weirdness budget, an LLM should be able to understand your new language despite not being trained on it. if you just explain what's different about it. I've had great success with this so far even on early LLM models. One thing you can do is give it the EBNF grammar and it can just generate strings from that. But that method is prone to hallucinations.