We can be sure via analysis based on computational theory, e.g. https://arxiv.org/abs/2503.03961 and https://arxiv.org/abs/2310.07923 . This lets us know what classes of problems a model is able to solve, and sufficiently deep transformers with chain of thought have been shown to be theoretically capable of solving a very large class of problems.
A random number generator is guaranteed to produce a correct solution to any problem, but runtime usually does not meet usability standards.
Also, solution testing is mandatory. Luckily, you can ask an RNG for that, too, as long as you have tests for the testers already written.
Note that these theorems show that there exists a transformer that can solve these problems, they tell you nothing about whether there is any way to train that transformer using gradient descent from some data, and even if you could, they don't tell you how much data and of what kind you would need to train them on.
But this uses the transformers model to justify its own reasoning strength which might be a blindspot, which is my original point. All the above shows is that transformers can simulate solving a certain set of problems. It doesn't show that they are the best tool for the job.
Keep in mind that proofs of transformers being able to solve all problems in some complexity class work by taking a known universal algorithm for that complexity class and encoding it as a transformer. In every such case, you'd be better off using the universal algorithm you started with in the first place.
Maybe the hope is that you won't have to manually map the universal algorithm to your specific problem and can just train the transformer to figure it out instead, but there are few proofs that transformers can solve all problems in some complexity class through training instead of manual construction.