DNNs/LLMs can only predict next tokens based on training data. They often make big direction mistakes as they are particularly bad at common sense. Kind of like the Paperclip Maximizer scenario. They need a human with deep knowledge to drive them and to catch them when they go off the rails.
>DNNs/LLMs can only predict next tokens based on training data.
How do they decide between using 'a' or 'an'?
"Next token prediction" isn't a system. It's an interface a system uses. Nothing precludes an arbitrarily simple or complex behavior from producing a token logit.
And with what we know of LLMs? Autoregressive transformers are Turing complete in theory, and we are yet to find anything that LLMs are "fundamentally incapable" of in practice. Even continuous learning is already approximated with in-context learning - both allow a system to learn from prior experience, both have practical limits on how far they go. That's what powers "trial and error" in today's agentic LLMs.
"LLMs can only predict next tokens based on training data" is comforting but misleading. It just isn't the saving grace you want it to be. It describes an interface, not a ceiling. And if there is some sort of fundamental "capability ceiling" that LLMs are heading towards, we are yet to see it. We know plenty of things LLMs are bad at, but they keep getting less bad at them release to release.
If there is none, then, simply improving over the current recipes iteratively might yield systems that only "need a human" in the same way you "need" to have a boss. Maybe less so.