I think the discussion has to be more nuanced than this. "LLMs still can't do X so it's an idiot" is a bad line of thought. LLMs with harnesses are clearly capable of engaging with logical problems that only need text. LLMs are not there yet with images, but we are improving with UI and access to tools like figma. LLMs are clearly unable to propose new, creative solutions for problems it has never seen before.
Thank you for putting it so succinctly.
I keep explaining to my peers, friends and family that what actually is happening inside an LLM has nothing to do with conscience or agency and that the term AI is just completely overloaded right now.
I think it's too early to declare the Turing test passed. You just need to have a conversation long enough to exhaust the context window. Less than that, since response quality degrades long before you hit hard window limits. Even with compaction.
Neuroplasticity is hard to simulate in a few hundred thousand tokens.
Some people point at LLMs confabulating, as if this wasn’t something humans are already widely known for doing.
I consider it highly plausible that confabulation is inherent to scaling intelligence. In order to run computation on data that due to dimensionality is computationally infeasible, you will most likely need to create a lower dimensional representation and do the computation on that. Collapsing the dimensionality is going to be lossy, which means it will have gaps between what it thinks is the reality and what is.
> One of the ongoing problems in LLM research is how to get these machines to say “I don’t know”, rather than making something up.
To be fair, I've known humans who are like this as well.
if you can’t access the page through region blocks:
I have a question for all the "humans make those mistakes too" people in this thread, and elsewhere: have you ever read, or at least skimmed a summary of, "The Origin of Consciousness in the Breakdown of the Bicameral Mind"? Did you say "yeah, that sounds right"? Do you feel that your consciousness is primarily a linguistic phenomenon?
I am not trying to be snarky; I used to think that intelligence was intrinsically tied to or perhaps identical with language, and found deep and esoteric meaning in religious texts related to this (i.e. "in the beginning was the Word"; logos as soul as language-virus riding on meat substrate).
The last ~three years of LLM deployment have disabused me of this notion almost entirely, and I don't mean in a "God of the gaps" last-resort sort of way. I mean: I see the output of a purely-language-based "intelligence", and while I agree humans can make similar mistakes/confabulations, I overwhelmingly feel that there is no "there" there. Even the dumbest human has a continuity, a theory of the world, an "object permanence"... I'm struggling to find the right description, but I believe there is more than language manipulation to intelligence.
(I know this is tangential to the article, which is excellent as the author's usually are; I admire his restraint. However, I see exemplars of this take all over the thread so: why not here?)
"As LLMs etc. are deployed in new situations, and at new scale, there will be all kinds of changes in work, politics, art, sex, communication, and economics."
For an article five years in the making, this is what I expected it to be about. Instead, we got a ramble about how imperfect LLMs are right now.
And the past too, if we've been paying attention
> At the same time, ML models are idiots. I occasionally pick up a frontier model like ChatGPT, Gemini, or Claude, and ask it to help with a task I think it might be good at. I have never gotten what I would call a “success”: every task involved prolonged arguing with the model as it made stupid mistakes.
I have a ton of skepticism built-in when interacting with LLMs, and very good muscles for rolling my eyes, so I barely notice when I shrug a bad answer and make a derogatory inner remark about the "idiots". But the truth is, that for such an "stochastic parrot", LLMs are incredibly useful. And, when was the last time we stopped perfecting something we thought useful and valuable? When was the last time our attempts were so perfectly futile that we stopped them, invented stories about why it was impossible, and made it a social taboo to be met with derision, scorn and even ostracism? To my knowledge, in all of known human history, we have done that exactly once, and it was millennia ago.
> In general, ML promises to be profoundly weird. Buckle up.
I love that it ends with such a positive note, even though it's generally a critical article, at least it's well reasoned and not utterly hyping/dooming something.
Thanks yet again Kyle!
While the economic, energy, political and social issues associated with LLMs ought to be enough to nix the adoption that their boosters are seeking ...
... I still think there is an interesting question to be investigated about whether, by building immensely complex models of language, one of our primary ways that we interact with, reason about and discuss the world, we may not have accidentally built something with properties quite different than might be guessed from the (otherwise excellent) description of how they work in TFA.
I agree with pretty much everything in TFA, so this is supplemental to the points made there, not contesting them or trying to replace them.
> Models do not (broadly speaking) learn over time. They can be tuned by their operators, or periodically rebuilt with new inputs or feedback from users and experts. Models also do not remember things intrinsically: when a chatbot references something you said an hour ago, it is because the entire chat history is fed to the model at every turn. Longer-term “memory” is achieved by asking the chatbot to summarize a conversation, and dumping that shorter summary into the input of every run.
This is the part of the article that will age the fastest, it's already out-of-date in labs.
Great series of articles, thank you. It's exhausting reading a deluge of (often AI generated) comments from people claiming wild things about LLM's, and it's nice to hear some sanity enter the conversation.
Here's the opening paragraph of chapter 2 with "people" subbed out for terms referring AI/models/etc.
"People are chaotic, both in isolation and when working with other people or with systems. Their outputs are difficult to predict, and they exhibit surprising sensitivity to initial conditions. This sensitivity makes them vulnerable to covert attacks. Chaos does not mean people are completely unstable; most people behave roughly like anyone else. Since people produce plausible output, errors can be difficult to detect. This suggests that human systems are ill-suited where verification is difficult or correctness is key. Using people to write code (or other outputs) may make systems more complex, fragile, and difficult to evolve."
To me, this modified paragraph reads surprisingly plainly. The wording is off ("using people to write code") and I had to change that part about attractor behavior (although it does still apply IMO), but overall it doesn't seem like an incoherent paragraph.
This is not meant to dunk on the author, but I think it highlights the author's mindset and the gap between their expectations and reality.
The recent article of Sam Altman described pretty much as a compulsive liar. Would it be any surprise if his most impactful contribution to the world was a machine that compulsively lies?
I appreciate the directness of calling LLMs "Bullshit machines." This terminology for LLMs is well established in academic circles and is much easier for laypeople to understand than terms like "non-deterministic." I personally don't like the excessive hype on the capabilities of AI. Setting realistic expectations will better drive better product adoption than carpet bombing users with marketing.
The fact that these "bullshit machines" have already proven themselves relatively competent at programming, with upcoming frontier models coming close to eliminating it as a human activity, probably says a lot about the actual value and importance of programming in the scheme of things.
[dead]
Old and stupid hot take IMO. I want the time back I put into perusing this. Even the scale of LLMs is puny next to the scale of lying humans and the sheer impact one compulsively lying human can have given we love to be led by confidently wrong narcissists. I mean if that isn't obvious by now, I guess it never will be. The Vogon constructor fleet is way overdue in my book.
Meanwhile, engineers are achieving increasingly impressive and sophisticated things with coding agents, lies, warts, and all, but that doesn't play well with the narrative, so let's just pretend they aren't.
I get the frustration, but it's reductive to just call LLMs "bullshit machines" as if the models are not improving. The current flagship models are not perfect, but if you use GPT-2 for a few minutes, it's incredible how much the industry has progressed in seven years.
It's true that people don't have a good intuitive sense of what the models are good or bad at (see: counting the Rs in "strawberry"), but this is more a human limitation than a fundamental problem with the technology.
This is like all the usual anti-LLM talking points and sentiments fused together.
Doesn't it get boring?
I like using these models a lot more than I stand hearing people talk about them, pro or contra. Just slop about slop. And the discussions being artisanal slop really doesn't make them any better.
Every time I hear some variation of bullshitting or plagiarizing machines, my eyes roll over. Do these people think they're actually onto something? I've been seeing these talking points for literal years. For people who complain about no original thoughts, these sure are some tired ones.
> It remains unclear whether continuing to throw vast quantities of silicon and ever-bigger corpuses at the current generation of models will lead to human-equivalent capabilities. Massive increases in training costs and parameter count seem to be yielding diminishing returns. Or maybe this effect is illusory. Mysteries!
I’m not even sure whether this is possible. The current corpus used for training includes virtually all known material. If we make it illegal for these companies to use copyrighted content without remuneration, either the task gets very expensive, indeed, or the corpus shrinks. We can certainly make the models larger, with more and more parameters, subject only to silicon’s ability to give us more transistors for RAM density and GPU parallelism. But it honestly feels like, without another “Attention is All You Need” level breakthrough, we’re starting to see the end of the runway.