There are chess engines based on transformers, even DeepMind released one [1]. It achieved ~2900 Elo. It does have peculiarities for example in the endgame that are likely derived from its architecture, though I think it definitely qualifies as an example of the fact that simply because something is a next token predictor doesn't mean it cannot perform tasks that require intelligence and planning.
The r in strawberry is more of a fundamental limitation of our tokenization procedures, not the transformer architecture. We could easily train a LLM with byte-size tokens that would nail those problems. It can also be easily fixed with harnessing (ie for this class of problems, write a script rather than solve it yourself). I mean, we do this all the time ourselves, even mathematicians and physicists will run to a calculator for all kinds of problems they could in principle solve in their heads.