LLMs seem really bad with reading numbers and reporting them back. I’m building a game, and to se how well its docs were being indexed, I tried asking simple questions to ChatGPT, Gemini, whatever Microsoft’s thing is, etc:
“What is the armour value for the Leather Shirt” in the game Stravaeger?”
It confidently got it wrong.
“You can find the game at https://stravaeger.com”
Different confident answers, also wrong.
“You’ll find it in a table on this page: https://stravaeger.com/docs.html?inventory_item=LEATHER_SHIR...“
Oh, sorry. I was inferring from other similar games. Here is a different confidently wrong number.
“It’s also in the .json file linked on that page”
And another wrong value. Random numbers should have got it right by now, but no. And the confident, authoritative tone never changed. Every model I tried was the same story.