I do think this is a tool issue. Here is what the article says:
> For the multiplication task, note that agents that make external calls to a calculator tool may have ZEH = ∞. While ZEH = ∞ does have meaning, in this paper we primarily evaluate the LLM itself without external tool calls
The models can count to infinity if you give them access to tools. The production models do this.
Not that the paper is wrong, it is still interesting to measure the core neural network of a model. But modern models use tools.