>> LLMs are inherently statistical interpolators. They operate beautifully in an Open World (where missing information is just "unknown" and can be guessed or left vague) and they use Non-Monotonic Reasoning (where new information can invalidate previous conclusions).
I think LLM reasoning is not so much non-monotonic as unsound, in the sense that conclusions do not necessarily follow from the premises. New information may change conclusions but how that happens is anyone's guess. There's some scholarship on that, e.g. there's a series of papers by Subarao Kamphampathi and his students that show how reasoning models' "thiking" tokens don't really correspond to sound reasoning chains, even if they seem to improve performance overall [1].
But it is difficult to tell what reasoning really means in LLMs. I believe the charitable interpretation of claims about LLM reasoning is that it is supposed to be informal. There is evidence both for and against it (e.g. much testing is in fact on formal reasoning problems, like math exam questions or Sokoban, but there's tests of informal reasoning also, e.g. on the bar exam). However, different interpretations are hard to square with the claims that "we don't understand reasoning"; not a direct quote, but I'm aware of many claims like that by people whose job it is to develop LLMs and that were made at the height of activity around reasoning models (which seems now to have been superseded by activity around "world models") [1].
If LLMs are really capable of informal reasoning (I'm not necessarily opposed to that idea) then we really don't understand what that reasoning is, but it seems we're a bit stuck because to really understand it, we have to, well, formalise it.
That said, non-monotonic reasoning is supposed to be closer to the way humans do informal reasoning in the real world, compared to classical logic, even though classical logic started entirely as an effort to formalise human reasoning; I mean, with Aristotle's Syllogisms (literally "rsasonings" in Greek).
________________
[1] Happy to get links if needed.
Reasoning is a pattern that is embedded within the token patterns but the llms are imitating reasoning via learning symbolic reasoning patterns.
The very fact that it memorized the Ceasar cipher rot13 pattern is due to it being a Linux command and it had examples of patterns of 13 shifted letters. If you asked it to figure out a different shift it struggled.
Now compound that across all intelligent reasoning problems in the entirety of human existence and you'll see how we will never have enough data to make agi with this architecture and training paradigm.
But we will have higher and higher fidelity maps of symbolic reasoning patterns as they suck up all the agent usage data for knowledge work tasks. Hopefully your tasks fall out of distribution of the median training data scope
My claim was not that an LLM was a formal, mathematically sound non-monotonic logic engine, but that the problem space is "non-monotonic" and "open world". The fact that an LLM is "unsound" and "informal" is the exact reason why my approach is necessary. Because LLMs are unsound, informal, and probabilistic, as you say, forcing them to interface with Lean 4 is a disaster. Lean 4 demands 100% mathematical soundness, totality, and closed-world perfection at every step. An LLM will just hit a brick wall. Methods like Event-B (which I suggest in my article), however, are designed to tolerate under-specification. It allows the LLM to provide an "unsound" or incomplete sketch, and uses the Proof Obligations to guide the LLM into a sound state via refinement.