logoalt Hacker News

jonas21yesterday at 8:36 PM4 repliesview on HN

What did your AI-assisted workflow look like 1 year ago? I can only speak for myself, but I would carefully specify a class or module in great detail and then hand it off to the model to implement, then carefully review the result.

How about 2 years ago? Back then, I wouldn't even trust it to write a 5-line function without making some sort of silly mistake.

Today, I can leave an agent running by itself for 20 or 30 minutes and most of the time, it comes back with a result that's either flawless or can be refined to be good with a few back and forth messages. Maybe I still have to make some high-level decisions ahead of time, but all of the details, including exploring the codebase and figuring out what to do based on that, can be left to the agent. The amount of improvement just in the last 2 years has been staggering.

Now extrapolate how things will look if the trend continues for another 2 or 3 years.

Is this guaranteed to happen? No. But people have been predicting that we're going to hit a wall for a long time now, and we haven't yet. Maybe there's a wall just ahead of us. But maybe there's not -- and the "not" case seems likely enough that we should at least be planning for it.


Replies

dontlikeyoueithyesterday at 11:49 PM

I disagree with your assessment pretty strongly -- the models themselves hit a wall over a year ago once companies exhausted all existing training data. LLMs don't induce world models, and they aren't capable of real search an planning outside their training distributions. They, structurally, never will be.

I haven't noticed a change in what I trust a model to generate in response to a single prompt in a year. The failure modes are unchanged. Yes, specific failures have improved as they have been documented and passed into model training data, but the way the models fail has not changed. They still fail for me nearly every single day. I'm a pretty heavy user - 3-4 Claude code processes running at a time, all day every day.

What has gotten better is tooling around the model -- but there's no space for exponential growth there. At least, not without exponential cost increase, which would make the whole thing untenable anyway.

techblueberrytoday at 2:29 AM

https://fortune.com/2026/01/29/100-percent-of-code-at-anthro...

I feel like the problem is there aren’t any great metrics. Boris Cherny probably gets paid like $2 mil per year. So what does it mean that Claude writes 100% of his code? And Claude writes 100% of code for most teams? Has Anthropic started laying people off? If Claude is writing 100% of code doesn’t that mean game over?

It’s both amazing and kind of a useless metric. How do I extrapolate out 100% 2-3 years from now? Super-duper 100%? Infinity infinity?

snaking0776yesterday at 11:37 PM

I wonder if our difference in view could be an instance of the jagged nature of AI’s intelligence. I do computational research in a basic science so write code or build models basically all day that is (occasionally) novel. I would say that I’ve noticed exponential improvements in parts of my job but certainly not all. For example, if I’m trying to visualize a concept from a paper I now go straight to Codex, give it the paper, and describe a webapp which allows me to play with the model in a way that wasn’t possible one year ago (this is great for teaching btw). If I have a script that I want to generalize, add in better metrics, or setup for running on a cluster I use codex and it does great.

Where it fails me though is exactly when I’m doing something novel like developing a new model or trying to develop some new method to process data. I’ve tried many times to one shot these ideas with detailed descriptions of what I want, how I’d like to generate abstractions, etc and it almost always ends up changing what I want to what I can only describe as something which better matches its training data. It often quietly changes key details that means that I have to delete the whole thing and start over. Just today this happened. On this level of task I’ve found that my workflow and pace of iteration hasn’t really changed at all in the last year. I still have to go and explain in detail on a function by function level what I want in much the same way I did a year ago. While that’s obviously a harder task, it seems to me like the task this whole long term exponential argument hinges on. I obviously could be wrong and maybe LLM with eval loop will do all of this for us but it seems still quite bad at anything without a clear definition of “good”.

I’m personally much more concerned about autonomous weapons, surveillance, and people plugging these things into places they don’t belong to avoid responsibility than I am the general possibility of these models being smarter than me in every way but obviously I could be wrong on this and am just using it incorrectly, hence the question.

baqyesterday at 9:12 PM

> Now extrapolate how things will look if the trend continues for another 2 or 3 years.

…and humans are famously bad at extrapolating exponentially, which is kinda the point of the essay.