One trend I've noticed, framed as a logical deduction:
1. Coding assistants based on o1 and Sonnet are pretty great at coding with <50k context, but degrade rapidly beyond that.
2. Coding agents do massively better when they have a test-driven reward signal.
3. If a problem can be framed in a way that a coding agent can solve, that speeds up development at least 10x from the base case of human + assistant.
4. From (1)-(3), if you can get all the necessary context into 50k tokens and measure progress via tests, you can speed up development by 10x.
5. Therefore all new development should be microservices written from scratch and interacting via cleanly defined APIs.
Sure enough, I see HN projects evolving in that direction.
> 5. Therefore all new development should be microservices written from scratch and interacting via cleanly defined APIs.
Not necessarily. You can get the same benefits you described in (1)-(3) by using clearly defined modules in your codebase, they don't need to be separate microservices.
So having clear requirements, a focused purpose for software, and a clear boundary of software responsibility makes for a software development task that can be accomplished?
If only people had figured out at some point that the same thing applies when communicating to human software engineers.
> you can speed up development by 10x.
If you know what you are doing, then yes. If you are a domain expert and can articulate your thoughts clearly in a prompt, you will most likely see a boost—perhaps two to three times—but ten times is unlikely. And if you don't fully understand the problem, you may experience a negative effect.
You don't need microservices for that, just factor your code into libraries that can fit into the context window. Also write functions that have clear inputs and outputs and don't need to know the full state of the software.
This has always been good practice anyway.
> Coding assistants based on o1 and Sonnet are pretty great at coding with <50k context, but degrade rapidly beyond that.
I had a very similar impression (wrote more in https://hua.substack.com/p/are-longer-context-windows-all-yo...).
One framing is that effective context window (i.e. the length that the model is able to effectively reason over) determines how useful the model is. A human new grad programmer might effectively reason over 100s or 1000s of tokens but not millions - which is why we carefully scope the work and explain where to look for relevant context only. But a principal engineer might reason over many many millions of context - code yes, but also organizational and business context.
Trying to carefully select those 50k tokens is extremely difficult for LLMs/RAG today. I expect models to get much longer effective context windows but there are hardware / cost constraints which make this more difficult.
> microservices written from scratch and interacting via cleanly defined APIs.
Introducing network calls because why? How about just factoring a monolith appropriately?
50K context is an interesting number because I think there's a lot to explore with software within an order of magnitude that size. With apologies to Richard Feynman, I call it, "There's plenty of room in the middle." My idea there is the rapid expansion of computing power during the reign of Moore's law left the design space of "medium sized" programs under-explored. These would be programs in the range of 100's of kilobytes to low megabytes.
It doesn't have to be microservices. You can use modular architecture. You can use polylith. You can have boundaries in your code and mock around them.
This is a helpful breakdown of a trend, thank you
Might be a boon for test-driven development. Could turn out that AI coding is the killer app for TDD. I had a similar thought about a year ago but had forgotten, appreciate the reminder
> 5. Therefore all new development should be ~~microservices~~ modules written from scratch and interacting via cleanly defined APIs.
We figured this out for humans almost 20 years ago. Some really good empirical research. It's the only approach to large scale software development that works.
But it requires leadership that gives a shit about the quality of their product and value long-term outcomes over short-term rewards.
> 3. If a problem can be framed in a way that a coding agent can solve...
This reminds me of the South Park underwear gnomes. You picked a tool and set an expectation, then just kind of hand wave over the hard part in the middle, as though framing problems "in a way coding agents can solve" is itself a well-understood or bounded problem.
Does it sometimes take 50x effort to understand a problem and the agent well enough to get that done? Are there classes of problems where it can't be done? Are either of those concerns something you can recognize before they impact you? At commercial quality, is it an accessible skill for inexperienced people or do you need a mastery of coding, the problem domain, or the coding agent to be able to rely on it? Can teams recruit people who can reliable achieve any of this? How expensive is that talent? etc