logoalt Hacker News

finnborgelast Saturday at 7:20 PM1 replyview on HN

This is amazing. It very creatively emphasizes how our definition of "boilerplate code" will shift over time. Another layer of abstraction would be running N of these, sandboxed, responding to each request, and then serving whichever instance is internally evaluated to have done the best. Then you're kind of performing meta reinforcement learning with each whole system as a head.

The hard part (coming from this direction) is enshrining the translation of specific user intentions into deterministic outputs, as others here have already mentioned. The hard part when coming from the other direction (traditional web apps) is responding fluidly/flexibly, or resolving the variance in each user's ability to express their intent.

Stability/consistency could be introduced through traditional mechanisms: Encoded instructions systematically evaluated, or, via the LLMs language interface, intent-focusing mechanisms: through increasing the prompt length / hydrating the user request with additional context/intent: "use this UI, don't drop the db."

From where I'm sitting, LLMs provide a now modality for evaluating intent. How we act on that intent can be totally fluid, totally rigid, or, perhaps obviously, somewhere in-between.

Very provocative to see this near-maximum example of non-deterministic fluid intent interpretation>execution. Thanks, I hate how much I love it!


Replies

SkiFire13last Saturday at 8:49 PM

> serving whichever instance is internally evaluated to have done the best. Then you're kind of performing meta reinforcement learning

I thought this didn't work? You basically end up fitting your AI models to whatever is the internal evaluation method, and creating a good evaluation method most often ends up having a similar complexity as creating the initial AI model you wanted to train.