My observation is that a lot of folks still discounting the capabilities or impact of AI either aren't working with frontier intelligence or aren't using it right.
While the coding horse has been beat within an inch of its life already, I'd recommend throwing Codex on 5.5 high thinking with Computer Use + auto approve at the next thing you're about to spend 5+ minutes on to start to get a feel for how well it handles a broad range of work across literally any surface you interact with today. Use voice mode & mobile app for remote control to seriously watch the friction break down.
Is it always perfect? Maybe not - but for a dramatically increasingly slate of tasks it's becoming a no brainer to offload the busywork and raise the bar on what a single person can do.
It's natural to have hype when you see where this already and where it's going.
You know, people could just put their money where their mouth is. IE go build something amazing instead of talking endlessly about how they're going to build amazing stuff. That's why this feels like so much theatre.
I find that thinking/agent mode sometimes makes it worse/comes up with the same thing and just takes a long time. But I’m sure it’ll be different with fable for a few months until that hype blows over
I should use it to read and vote on HN comments so that I don't have to waste time doing that myself.
I hate that these discussions go nowhere because there's no common metric anymore.
I have no idea what stuff like "is it always perfect?" means because it varies so much from person to person. Too many people have different expectations, are working on different problems, or have different standards or goals for there to be a common constructive discussion.
Why do all arguments from AI boosters boil down to this same cycle:
A new model is released, AI fans hail it as huge shift in whatever metrics the AI vendor has gamed this time, and all criticism is shrugged off as "not up to date" and met with "try the new model!" Then, once level heads actually put the claims to the test and find it wanting, criticism is met with "you're just not using it right, you have to learn how to prompt/context/loop engineer for best results" until the next model comes out and this argument repeats.
Or they aren’t building a SaaS with React or a TUI with Typecript, which is about the only thing that LLMs have “solved”.
I tried to have multiple models convert a simple textmate grammar to a vim one, and none of them could do it. They couldn't even use the right names between the regex matches and the color definitions. I tried for about 30 minutes. It took me about 5
I tried having them work on a LSP. The fact I got a one shot half working autocomplete based on my existing work was cool, but again, they flailed on incredibly simple things like file path normalization / converting from a URI and I had to rewrite a decent amount of code. I don't think I saved any time
People keep throwing this out there but I keep wondering where are the receipts? I am seeing less interesting software released, anecdotally I know, since AI has taken hold, than before.