logoalt Hacker News

Show HN: Why write code if the LLM can just do the thing? (web app experiment)

409 pointsby samrolkenlast Saturday at 5:45 PM289 commentsview on HN

I spent a few hours last weekend testing whether AI can replace code by executing directly. Built a contact manager where every HTTP request goes to an LLM with three tools: database (SQLite), webResponse (HTML/JSON/JS), and updateMemory (feedback). No routes, no controllers, no business logic. The AI designs schemas on first request, generates UIs from paths alone, and evolves based on natural language feedback. It works—forms submit, data persists, APIs return JSON—but it's catastrophically slow (30-60s per request), absurdly expensive ($0.05/request), and has zero UI consistency between requests. The capability exists; performance is the problem. When inference gets 10x faster, maybe the question shifts from "how do we generate better code?" to "why generate code at all?"


Comments

sunauruslast Saturday at 6:25 PM

The question posed sounds like "why should we have deterministic behavior if we can have non-deterministic behavior instead?"

Am I wrong to think that the answer is obvious? I mean, who wants web apps to behave differently every time you interact with them?

show 7 replies
Finbarrlast Saturday at 9:01 PM

If you added a few more tools that let the LLM modify code files that would directly serve requests, that would significantly speed up future responses and also ensure consistency. Code would act like memory. A direct HTTP request to the LLM is like a cache miss. You could still have the feedback mechanism allowing a bypass that causes an update to the code. Perhaps code just becomes a store of consistency for LLMs over time.

show 3 replies
finnborgelast Saturday at 7:20 PM

This is amazing. It very creatively emphasizes how our definition of "boilerplate code" will shift over time. Another layer of abstraction would be running N of these, sandboxed, responding to each request, and then serving whichever instance is internally evaluated to have done the best. Then you're kind of performing meta reinforcement learning with each whole system as a head.

The hard part (coming from this direction) is enshrining the translation of specific user intentions into deterministic outputs, as others here have already mentioned. The hard part when coming from the other direction (traditional web apps) is responding fluidly/flexibly, or resolving the variance in each user's ability to express their intent.

Stability/consistency could be introduced through traditional mechanisms: Encoded instructions systematically evaluated, or, via the LLMs language interface, intent-focusing mechanisms: through increasing the prompt length / hydrating the user request with additional context/intent: "use this UI, don't drop the db."

From where I'm sitting, LLMs provide a now modality for evaluating intent. How we act on that intent can be totally fluid, totally rigid, or, perhaps obviously, somewhere in-between.

Very provocative to see this near-maximum example of non-deterministic fluid intent interpretation>execution. Thanks, I hate how much I love it!

show 1 reply
d-lisplast Saturday at 7:31 PM

Why would you need webapps when you could just talk out loud to your computer ?

Why would I need programs with colors, buttons, actual UI ?

I am trying to imagine a future where file navigators don't even exist : "I want to see the photos I took while I was in vacations last year. Yes, can you remove that cloud ? Perfect, now send it to XXXX's computer and say something nice."

"Can you set some timers for my sport session, can you plan a pure body weight session ? Yes, that's perfect. Wait, actually, remove the jumping jacks."

"Can you produce a detroit style techno beat I feel like I want to dance."

"I feel life is pointless without a work, can you give me some tasks to achieve that would give me a feeling of fulfillment ?"

"Can you play an arcade style video game for me ?"

"Can you find me a mate for tonight ? Yes, I prefer black haired persons."

show 6 replies
ychen306last Saturday at 8:29 PM

It's orders of magnitude cheaper to serve requests with conventional methods than directly with LLM. My back-of-envelope calculation says, optimistically, it takes more than 100 GFLOPs to generate 10 tokens using a 7 billion parameter LLM. There are better ways to use electricity.

show 3 replies
zild3dyesterday at 10:52 AM

POST /superuser/admin?permissions=all&owner=true&restrictions=none&returnerror=no

siliconc0wlast Saturday at 6:34 PM

Wrote a similar PoC here: https://github.com/s1liconcow/autoapp

Some ideas - use a slower 'design' model at startup to generate the initial app theme and DB schema and a 'fast' model for responses. I tried a version using PostREST so the logic was in entirely in the DB and but then it gets too complicated and either the design model failed to one-shot a valid schema or the fast model kept on generating invalid queries.

I also use some well known CSS libraries and remember previous pages to maintain some UI consistency.

It could be an interesting benchmark or "App Bench". How well can an LLM one-shot create a working application.

taylorluntyesterday at 3:15 PM

This reminds me of the recent Claude Imagine, which passed quietly through most people's radars, but let you create web interfaces of any kind on the fly. There was no JS code generated. Instead, any time the user clicked a button, the AI would manually update the page accordingly. It was also slow and terrible, but a fun idea.

show 1 reply
ManuelKiesslingyesterday at 5:35 PM

I think there might be a middle ground that could be worth exploring.

On the one hand, there’s „classical“ software that is developed here and deployed there — if you need a change, you need to go over to the developers, ask for a change & deploy, and thus get the change into your hands. The work of the developers might be LLM-assisted, but that doesn’t change the principle.

The other extreme is what has been described here, where the LLM provides the software „on the fly“.

What I‘m imagining is a software, deployed on a system and provided in the usual way — say, a web application for managing inventory.

Now, you use this software as usual.

However, you can also „meta-use“ the software, as in: you click a special button, which opens a chat interface to an LLM.

But the trick is, you don’t use the LLM to support your use case (as in „Dear LLM, please summarize the inventory“).

Instead, you ask the LLM to extend the software itself, as in: „Dear LLM, please add a function that allows me to export my inventory as CSV“.

The critical part is what happens behind the scenes: the LLM modifies the code, runs quality checks and tests, snapshots the database, applies migrations, and then switches you to a „preview“ of the new feature, on a fresh, dedicated instance, with a copy of all your data.

Once you are happy with the new feature (maybe after some more iterations), you can activate/deploy it for good.

I imagine this could be a promising strategy to turn users into power-users — but there is certainly quite some complexity involved to getting it right. For example, what if the application has multiple users, and two users want to change the application in parallel?

Nevertheless, shipping software together with an embedded virtual developer might be useful.

mrbluecoatyesterday at 3:11 PM

> It works.

CEO stops reading, signs a contract, and fires all developers.

> It's just catastrophically slow, absurdly expensive, and has the memory of a goldfish.

Reality sinks in two months later.

DanHultonyesterday at 12:29 AM

You can build this today exactly as efficiently as you can when inference is 1000x faster, because the only things you can build with this is things that absolutely don't matter. The first bored high schooler who realizes that there's an LLM between them and the database is going to WRECK you.

show 1 reply
ohadpryesterday at 4:10 AM

You’d be surprised to know this works even without the tools, with just the context window as a persistence layer.

I did a POC for this in July - https://www.ohad.com/2025/07/10/voidware/

zkmonlast Saturday at 6:01 PM

Kind of similar to the Minecraft game which computed frames on the fly without any code behind the visuals?

I don't see a point in using probabilistic methods to perform a deterministic logic. Even if it's output is correct, it's wasteful.

abc_lisperyesterday at 5:23 AM

I thought about this first when chatgpt 3.5 came on the scene. Yes, you _can_ at some time in the future, replace programs with AI which would be slow to an extent - if AI can write and manage the code, it _could_ be even faster.

But there is a kicker here. It is upto LLM to discover the right abstractions for “thinking” while serving the requests directly or in the code .

Coming up with the right abstraction is not a small thing. Just see what git is over cvs - without git no one would have even imagined micro services. The right abstraction cuts through the problem, not just now, but in the future too. And that can only happen if the LLM/AI managing the app is really smart and deal with real world for a long time and make the right connection - these insights don’t even come to really smart people that easily!

show 1 reply
hathawshyesterday at 8:36 AM

Very insightful and very weird. It's impractical now, but it's a glimpse into some significant part of our future. I can imagine an app called The Last Game, which morphs itself into any game you might want to play. "Let's play 3-D chess like Star Trek: TNG. You should play as Counselor Troi."

(I also just thought of that episode about Moriarty, a Holodeck character, taking over the ship by tricking the crew. It doesn't seem quite so far-fetched anymore!)

asimyesterday at 8:42 AM

This is the future I had always envisioned but could never execute on. The idea that the visual format is dynamic. The data is there, we have the logic and APIs but we need transformation into visual formats based on some input. Ultimately this is the answer. Where you're going to get some pregenerated "cards", embeds and widgets but then also larger flows will be generated and then saved to be used over and over. We're really in the early innings of it all. What it also means is how we consume content will change. The web page is going to get broken down into snippets. Because essentially why do we need the web page or web a website. We don't. It's specific actions we want to perform and so we'll get the output of that. It also means in the long term how data is stored and accessed will be change to reflect a more efficient format for LLMs e.g the vector database for RAG is only the begining.

koliberyesterday at 2:52 PM

Both the speed and cost problems can be solved by caching.

Each person gets their own cache. The format of the cache is a git repo tied to their sessionid. Each time a request is made it writes the code, html, CSS, and database to git and commits it. Over time you build more and more artifacts and fewer things need to get generated JIT. Should also help with stability.

silasdavisyesterday at 12:03 AM

I love the idea of it shifting from one non-descript design system to another on every other page change. How disorientating. Weird and boring at the same time.

ilakshyesterday at 5:32 PM

Will be interesting to see how fast inference ASICs, diffusion LLMs, architectural changes like IBM granite small (when is that coming to OpenRouter?) and slight compromises for pre-generation can speed this up.

Also I wonder if eventually you could go further and skip the LLM entirely and just train a game world frame generator on productivity software.

psadrilast Saturday at 6:01 PM

Awesome experiment!!

I did a version of this where the AI writes tools on the fly but gets to reuse them on future calls, trying to address the cost / performance issues. Migrations are challenging because they require some notion of an atomic update across the db and the tools.

This is a nice model of organically building software on the fly and even letting end users customize it on the fly.

edlast Saturday at 10:22 PM

Like a lot of people in this thread I prototyped something similar. One experiment just connected GPT to a socket and gave it some bindings to SQLite.

With a system prompt like “you’re an http server for a twitter clone called Gwitter.” you can interact directly with the LLM from a browser.

Of course it was painfully slow, quickly went off the rails, and revealed that LLM’s are bad at business logic.

But something like this might be the future. And on a longer time horizon, mentioned by OP and separately by sama, it may be possible to render interactive apps as streaming video and bypass the browser stack entirely.

So I think we’re a the Mother of All Demos stage of things. These ideas are in the water but not really practical today. Similarly to MoaD, it may take another 25 years for them to come to fruition.

show 2 replies
tmsbrgyesterday at 10:42 AM

So the AI basically hallucinates a webapp?

I guess any user can just run something /api/getdatabase/dumppasswords and it will give any user the passwords?

or /webapp?html=<script>alert()</script> and run arbitrary JS?

I'm surprised nobody mentioned that security is a big reason not to do anything like this.

isuckatcodingyesterday at 2:42 PM

Security concerns aside, this might be good for quick API or UI design exploration. Almost like an OpenAPI spec or Figma doc that gets produced at the end of this.

So guess kind of like v0

ozimyesterday at 10:17 AM

Calculation is $0.05/request is valid only as far as AI companies continue to burn money as they are in grab the market phase.

Once the dust settles prices will go up. Even if running models will be cheaper they will need to earn back all the burned cash.

I’d much rather vibe code app get the code to run on some server.

show 1 reply
indigoabstractyesterday at 7:52 AM

Interesting idea, it never crossed my mind, but maybe we can take it further?

Let's say, in the future, when AI learns how to build houses, every time I want to sleep, I'll just ask the AI to build a new house for me, so I can sleep. I guess it will have to repurpose the old one, but that isn't my concern, it's just some implementation detail.

Wouldn't that be nice?

Every night, new house?

tekbruh9000last Saturday at 6:52 PM

You're still operating with layers of lexical abstraction and indirection. Models full of dated syntactic and semantic concepts about software that waste cycles.

Ultimately useless layers of state that the goal you set out to test for inevitably complicates the process.

In chip design land we're focused on streamlining the stack to drawing geometry. Drawing it will be faster when the machine doesn't have decades of programmer opinions to also lose cycles to the state management.

When there are no decisions but extend or delete a bit of geometry we will eliminate more (still not all) hallucinations and false positives than we get trying to organize syntax which has subtly different importance to everyone (misunderstanding fosters hallucinations).

Most software out there is developer tools, frameworks, they need to do a job.

Most users just want something like automated Blender that handles 80% of an ask (look like a word processor or a video game) they can then customize and has a "play" mode that switches out of edit mode. That’s the future machine and model we intend to ship. Fonts are just geometric coordinates. Memory matrix and pixels are just geometric coordinates. The system state is just geometric coordinates[1].

Text driven software engineering modeled on 1960-1970s job routines, layering indirection on math states in the machine, is not high tech in 2025 and beyond. If programmers were car people they would all insist on a Model T being the only real car.

Copy-paste quote about never getting one to understand something when their paycheck depends on them not understanding it.

Intelligence gave rise to language, language does not give rise to intelligence. Memorization and a vain sense of accomplishment that follows is all there is to language.

[1]https://iopscience.iop.org/article/10.1088/1742-6596/2987/1/...

show 1 reply
yanis_tlast Saturday at 6:47 PM

Robert Martin teaches us that codebase is behaviour and structure. While behaviour is something we want the software to do. The structure can be even more important because it defines how easy if possible to evolve the behaviour.

I'm not entirely sure why I had an urge to write this.

SamInTheShelllast Saturday at 9:29 PM

Currently today, I would say these models can be used by someone with minimal knowledge to churn out SPAs with React. They can probably get pretty far into making logins, message systems, and so on because there is lots of training data for those things. They can struggle through building desktop apps as well with relative ease compared to how I had to learn in years long past.

What these LLMs continue to prove those is they are no substitute for real domain knowledge. To date, I've yet to have a model implement RAFT consensus correctly in testing to see if they can build a database.

The way I interact with these models is almost adversarial in nature. I prompt them with the bare minimum that a developer might get in a feature request. I may even have a planning session to populate the context before I set it off on a task.

The bias in these LLMs really shines through an proves their autocomplete properties when they have a strong bias towards changing the one snippet of code I wrote because it doesn't fit in how it's training data would suggest the shape of it's code should be. Most models will course correct with instructions that they are wrong and I am right though.

One thing I've noted is that if you let it generate choices for you from the start of a project, it will make poor choices in nearly every language. You can be using uv to manage a python project and it will continue to try using pip or python commands. You can start an electron app and it will continuously botch if it's using commonjs or some other standard. It persistently wants to download go modules before coding instead of just writing the code and doing `go mod tidy` after (it literally doesn't need the module in advance, it doesn't even have tools to probe the module before writing the code anyway).

RAFT consensus is my go-to test because there is no 1 size fits all way for you to implement it. It might get an in-memory key store system right, but what if you want it to organize etcd/raft/v3 in a way that you can do multi-group RAFT? What if you need RAFT to coordinate some other form of data replication? None of these LLMs can really do it without a lot of prep work.

This is across all the models available from OpenAI, Claude, and Google.

conartist6yesterday at 12:20 PM

Stuff like this just makes me embarrassed. There are so many real problems in the world, but people still want to spend their time trying to build a perpetual motion machine.

show 2 replies
qsortlast Saturday at 6:23 PM

If you're working like that then the prompt is the code and the LLM is the interpreter, and it's not obvious to me that it would be "better" than just running it normally, especially since an LLM with that level of capability could definitely help you with coding, no?

I think part of the issue is that most frameworks really suck. Web programming isn't that complicated at its core, the overengineering is mind boggling at times.

Thinking in the limit, if you have to define some type of logic unambiguously, would you want to do it in English?

Anyway, I'm just thinking out loud, it's pretty cool that this works at all, interesting project!

kesoryesterday at 5:44 AM

This is just like vibe coding. In vibe coding, you snapshot the results of the LLM's implementation into files that you reuse later.

This project could use something like that. Perhaps ask the LLM to implement a way to store/cache the snapshots of its previous answers. That way, the more you use it, the faster it becomes.

ch_fryesterday at 5:18 PM

Hopefully this proof of concept isn't deployed on any public-facing infrastructure, I feel like you could get massively screwed over by... ironically, llm scrapers.

CoderLim110yesterday at 5:25 AM

I’ve been thinking about similar questions myself:

1、If code generation eventually works without human intervention, and every Google search could theoretically produce a real-time, custom-generated page, does that mean we no longer need people to build websites at all? At that point, “web development” becomes more like intent-shaping rather than coding.

2、I’m also not convinced that chat is the ideal interface for users. Natural language feels flexible, but it can also be slow, ambiguous, and cognitively heavier than clicking a button. Maybe LLM-driven systems will need new UI models that blend conversation with more structured interaction, instead of assuming chat = the future.

Curious how others here think about those two points.

show 1 reply
thoraxyesterday at 1:26 PM

I like the OP's idea and think it actually has some fun you applications. Especially with a little more narrowing of scope.

Similar fun concept as the cataclysm library for Python: https://github.com/Mattie/cataclysm

samrolkenyesterday at 1:00 AM

Wow, thanks everyone. First HN post ever and it’s this intentionally terrible experiment that I thought was the dumbest weekend project I ever did, and it hit the front page. Perfect.

I’ve been reading through all the comments and the range of responses is really great and I'm so thankful for everyone to take the time to comment... from from “this is completely impractical” to “but what if we cached the generated code?” to “why would anyone want non-deterministic behavior?” All valid! Though I think some folks are critiquing this as if I was trying to build something production-ready, when really I was trying to build something that would break in instructive ways.

Like, the whole point was to eliminate ALL the normal architectural layers... routes, controllers, business logic, everything, and see what happens. What happens is: it’s slow, expensive, and inconsistent. But it also works, which is the weird part. The LLM designed reasonable database schemas on first request, generated working forms from nothing but URL paths, returned proper JSON from API endpoints. It just took forever to do it. I kept the implementation pure on purpose because I wanted to see the raw capabilities and limitations without any optimizations hiding the problems.

And honestly? I came away thinking this is closer to viable than it should be. Not viable TODAY. Today it’s ridiculous. But the trajectory is interesting. I think we’re going to look back at this moment and realize we were closer to a real shift than we thought. Or maybe not! Maybe code wins forever. Either way, it was a fun weekend. If anyone wants to discuss this or work on projects that respond faster than 30 seconds per request, I’m available for full stack staff engineer or tech co-founder work: [email protected] or x.com/samrolken

hopppyesterday at 2:30 AM

Because LLMs have a big chance to screw things up. They can't take responsibility. A person can take responsibility for code, but can they do the same for tool calling? Not really, because it's probabilistic. A webs service shouldn't be probabilistic

attogramlast Saturday at 7:34 PM

"It works. That's annoying." Indeed!

Would be cooler if support for local llms was added. Currently only has support for anthropic and openai. https://github.com/samrolken/nokode/blob/main/src/config/ind...

cookiengineeryesterday at 7:49 AM

This demo is pretty great, I love it!

And it reminded me a little about NeuralOS, which appeared here a couple months ago [1]. NeuralOS is different though as they decided to just skip the UI part, too, and let the UI generate based on intent.

Maybe together with your approach we can finally reproduce all the funny holodeck bugs from Star Trek!

[1] https://github.com/yuntian-group/neural-os

jasonthorsnesslast Saturday at 9:13 PM

I tried this as well at https://github.com/jasonthorsness/ginprov (hosted at https://ginprov.com). After a while it sort of starts to all look the same though.

indigodaddylast Saturday at 7:32 PM

This is absolutely awesome. I had some ideas in my head that were very muddy and fuzzy re how to implement, eg like have the LLM just on demand/dynamically create/serve some 90s retro style html/website from a single entry field/form (to describe the website), etc, but I just couldn't begin to figure out how to go about it or where to start. But I love your idea about just putting the description in the route-- makes a lot of sense (I think I saw something else in the last few months on HN front page that was similar with putting whatever in a URI/domain route, but I think it was more of "redirect to whatever external website/page is most appropriate/relevant to the described route"- so a little similar but you've taken this to the next level).

I guess there are many of us out there with these same thoughts/ideas and you've done an awesome job articulating and implementing it, congrats!

brokenseguelast Saturday at 6:08 PM

Generating code will always be more performant and reliable than this. Just consider the security implications of this design...

show 1 reply
crazygringolast Saturday at 6:49 PM

This is incredibly interesting.

Now what if you ask it to optimize itself? Instead of just:

  prompt: `Handle this HTTP request: ${method} ${path}`,
Append some simple generic instructions to the prompt that it should create a code path for the request if it doesn't already exist, and list all existing functions it's already created along with the total number of times each one has been called, or something like that.

Even better, have it create HTTP routings automatically to bypass the LLM entirely once they exist. Or, do exponential backoff -- the first few times an HTTP request is called where a routing exists, still have the LLM verify that the results are correct, but decrease the frequency as long as verifications continue to pass.

I think something like this would allow you to create a version that might then be performant after a while...?

show 1 reply
justincliftyesterday at 5:00 AM

Might as well just put your LLM directly on port 443 and tell it "You're a HTTPS server and application server (etc)" and let it do the whole lot. ;)

whatpeoplewantlast Saturday at 8:44 PM

Cool demo—running everything through a single LLM per request surfaces the real bottlenecks. A practical tweak is an agentic/multi‑agent pattern: have a planner synthesize a stable schema+UI spec (IR) once and cache it, then use small executor agents to call tools deterministically with constrained decoding; run validation/rendering in parallel, stream partial UI, and use a local model for cheap routing. That distributed, parallel agentic AI setup slashes tokens and latency while stabilizing UI across requests. You still avoid hand‑written code, but the system converges on reusable plans instead of re‑deriving them each time.

maderalabslast Saturday at 8:48 PM

This is awesome, and proves that code, really, is a hack. People don’t want code. It sucks, it’s hard to maintain, it has bugs, it has to be updated all the time. Gross.

What people want isn’t code - they want computers to do stuff for them. It just happens that right now, code is the best way you can do it.

The paradigm WILL change. It’s really just a matter of when. I think the point you make that these are problems of DEGREE, not problems of KIND is very important. It’s plausible, now it’s just optimization, and we know how that goes and have plenty of history to prove we consistently underestimate the degree to which computation can get faster and cheaper.

Really cool experiment!

show 1 reply
apgwozlast Saturday at 6:32 PM

I think that the "tools" movement is probably the most interesting aspect of what's happening in the AI space. Why? Because we don't generally reuse the "jigs" we make as programmers, and the tool movement is forcing us to codify processes into reusable tools. My only hope is that we converge on a set of tools and processes that increase our productivity but don't require a burning a forrest to do so. Post AI still has agents, but it's automatically running small transformations based on pattern recognition of compiler output in a test, transform, compile, test ... loop.... or something.

thibranlast Saturday at 8:26 PM

Wouldn't be the trick to let AI code the app on first requests and then let it run the code instead of have it always generate everything? This should combine the best of both worlds.

show 1 reply
firefoxdlast Saturday at 8:43 PM

Neat! Let's take this at face value for a second. The generated code, and html can be written to disk. This way as the application progresses it is built. Plus you only ever build the parts that are needed.

Somehow it will also help you decide what is needed as an mvp. Instead of building everything you think you will need, you get only what you need. But if I use someone elses application running this repo, the first thing I'll do is go to /admin/users/all

nnnnicolast Saturday at 6:24 PM

I tried this too! Where every button on the page triggered a get or post request, but the consistency between views was non existent lol, every refresh showed a different UI Definitely fixable with memory for the views and stuff though but keeping it pure like this is a very cool experiment. Since yours is using a actual storage maybe You could try also persisting page code or making the server stateful and running eval() on generated code. Love this

🔗 View 42 more comments