It's not about making no mistakes, it's about the categorical type of mistakes.
Let's consider language as a world, in some abstract sense. Lies may (or may not) be consistent here. Do they make sense linguistically? But then think about the category of errors where they start mixing languages and sound entirely nonsensical. That's rare with current LLMs in standard usage but you can still get them to have full on meltdowns.
This is the class of mistakes these models are making, not the failing to recite truth class of mistakes.
(Not a perfect translation but I hope this explanation helps)