Can it solve easy problems yet? Weirdly, I think that's an important milestone.
Prompts like, "Give me five odd numbers that don't have the letter 'e' in their spelling," or "How many 'r's are in the word strawberry?"
I suspect the breakthrough won't be trivial that enables solving trivial questions.
> Can it solve easy problems yet? Weirdly, I think that's an important milestone.
Easy for who? Some problems are better solved in one way compared to another.
In the case of counting letters and such, it is not a easy problem, because of how the LLM tokenizes their input/outputs. On the other hand, it's really simple problem for any programming/scripting language, or humans.
And then you have problems like "5142352 * 51234" which is trivial problems for any basic calculator, but very hard for a human or a LLM.
Or "problems" like "Make a list of all the cities that had celebrity from there who knows how to program in Fortan", would be a "easy" problem for a LLM, but pretty much a hard problem anything else than Wikidata, assuming both LLM/Wikidata have data about it in their datasets.
> I suspect the breakthrough won't be trivial that enables solving trivial questions.
So with what I wrote above in mind, LLMs already solve trivial problems, assuming you think about the capabilities of the LLM. Of course, if you meant "trivial for humans", I'll expect the answer to always remain "No", because things like "Standing up" is trivial for humans, but it'll never be trivial for a LLM, it doesn't have any legs!
I would argue anything requiring insights on spelling is a hard problem for an LLM: they use tokens, not letters. Your point still stands, but you need different examples IMO.
There is no breakthrough required, it's trivial. It's just that by making a model do that, you'll screw it up on several other dimensions.
Asking a question like this only highlights the questioners complete lack of understanding of LLMs rather than an LLMs inability to do something.
> Give me five odd numbers that don't have the letter 'e' in their spelling
Compare the reasoning times!!! 84s vs 342s
R1 (Thought for 84 seconds)
o1 Pro (Thought for 5 minutes and 42 seconds)