"Next token prediction" is an interface, not an algorithm. A process that "predicts next tokens" can be arbitrarily complex or simple, and arbitrarily capable or incapable of performing a given task.
Saying that an LLM can or can't do something because it's a "token predictor" is a category error. The interface isn't a hard limit.