Look I'll fully cosign LLMs having some legitimate applications, but that being said, 2025 was the YEAR OF AGENTIC AI, we heard about it continuously, and I have never seen anything suggesting these things have ever, ever worked correctly. None. Zero.
The few cases where it's supposedly done things are filled with so many caveats and so much deck stacking that it simply fails with even the barest whiff of skepticism on behalf of the reader. And every, and I do mean, every single live demo I have seen of this tech, it just does not work. I don't mean in the LLM hallucination way, or in the "it did something we didn't expect!" way, or any of that, I mean it tried to find a Login button on a web page, failed, and sat there stupidly. And, further, these things do not have logs, they do not issue reports, they have functionally no "state machine" to reference, nothing. Even if you want it to make some kind of log, you're then relying on the same prone-to-failure tech to tell you what the failing tech did. There is no "debug" path here one could rely on to evidence the claims.
In a YEAR of being a stupendously hyped and well-funded product, we got nothing. The vast, vast majority of agents don't work. Every post I've seen about them is fan-fiction on the part of AI folks, fit more for Ao3 than any news source. And absent further proof, I'm extremely inclined to look at this in exactly that light: someone had an LLM write it, and either they posted it or they told it to post it, but this was not the agent actually doing a damn thing. I would bet a lot of money on it.
Thank you for making me recover at least some level of sanity (or at least to feel like that).
Can you elaborate a bit on what "working correctly" would look like? I have made use of agents, so me saying "they worked correctly for me" would be evidence of them doing so, but I'd have to know what "correctly" means.
Maybe this comes down to what it would mean for an agent to do something. For example, if I were to prompt an agent then it wouldn't meet your criteria?
It's very unclear to me why AI companies are so focused on using LLMs for things they struggle with rather than what they're actually good at; are they really just all Singularitarians?
Absolutely. It's technically possible that this was a fully autonomous agent (and if so, I would love to see that SOUL.md) but it doesn't pass the sniff test of how agents work (or don't work) in practice.
I say this as someone who spends a lot of time trying to get agents to behave in useful ways.