I realize there is subtlety to the question of which is first. An infant, crying when it is hungry and pre-linguistic, is applying modus ponens. C -> F crying implies food, so I cry and then I get fed. Language grows in humans just like arms and legs, and so does reasoning. Baby animals do the same behavior but don't use language, so perhaps some logic is wired by instinct. Either way I don't think we need to worry about that detail.
Consider how language input to an LLM is tokenized. Now imagine a tokenization scheme that introduces tokens that track the strict logical reasoning in the language. Thus two completely different English sentences could both tokenize as the application of Modus Ponens over assumption 1 to conclude conclusion 2, for example.
Now consider that we can tokenize formal notation as used in mathematics and logic, and we can train LLMs on mathematical papers, peer review write-ups, etc. We can generate millions of correct proofs and teach it which ones are remarkable and why, etc.
Ultimately we run into the same barrier as mathematical constructivists run into, but I think it's still quite plausible that LLMs trained as I describe would be able to reason quite well and find oversights humans missed. However creating the optimal scheme and implementation is not trivial.