Yesterday I asked mistral to list five mammals that don't have "e" in their name. Number three was "otter" and number five was "camel".
phi4-mini-reasoning took the same prompt and bailed out because (at least according to its trace) it interpreted it as meaning "can't have a, e, i, o, or u in the name".
Local is the only inference paradigm I'm interested in, but these things have a way to go.
Models will always struggle with this specific task without tool use, because of the way they tokenize things. I think a bit of prompt engineering, asking it to spell out each work or giving it the ability to run a “contains e” python function on a lot of animal names it generates or searches for solves this.
Lots of local ai use cases I think are solvable similarly once local models get good at tool use and have the proper harness.
Treat LLMs as dyslexic when it comes to spelling. Assess their strengths and weaknesses accordingly.
I don't really see the problem here. Yeah, we know that these models are not good for actual logic. These models are lossy data compression and most-likely-responses-from-internet-forums-and-articles machines.
This kind of parlor tricks are not interesting and just because a model can list animals with or without some letters in their names doesn't mean anything especially since it isn't like the model "thinks" in English it just gives you the answer after translating it to English.
These are funny, like how you can do weird stuff with JavaScript language by combining special characters, but that doesn't really mean anything in the grand scheme of things. Like JavaScript these models despite their specific flaws still continue to deliver value to people using them.