https://www.sigops.org/2026/can-llms-model-real-world-system...
"Running leading LLMs across the eleven systems shows that LLMs are great at producing correct TLA+ syntax but struggle to ensure conformance and appropriate invariants."
This blog post made it more difficult than it should to find the actual tool (needed to click twice! abysmal, really): https://github.com/specula-org/Specula
Ran it on my tricky caching changes and it found bugs the standard agentic review flows missed on all frontier models. Good stuff.
I think that's what author implies by this sentence in the intro:
> It’s still your responsibility to understand your system and define what “correctness” means, and you need a high-level understanding of temporal logic.