> try using obscure CPUs
I tried asking Gemini and ChatGPT, "What opcode has the value 0x3c on the Intel 8048?"
They were both wrong. The datasheet with the correct encodings is easily found online. And there are several correct open source emulators, eg MAME.
If the LLM doesn't have a websearch tool your test doesn't make any sense.
An LLM by itself is like a lossy image of all text in the internet.
Think of "What opcode has the value 0x3c on the Intel 8048" as a PNG image but the LLM like a very compressed JPEG. It will only get a very approximate answer. But you can give it explicit tools to look up things.