yesterday I had to tell a frontier model to translate my code to tla+ to find a tricky cache invalidation bug which nothing could find - gpt 5.4, gemini 3.1, opus 4.6 all failed. translation took maybe 5 mins, the bug was found in seconds, total time to fix from idea to commit - about 15 mins.
if you can get a model to quickly translate a relevant subset of your code to lean to find tricky bugs and map lean fixes back to your codebase space, you've got yourself a huge unlock. (spoiler alert: you basically can, today)
This matches my experience too. The bugs that survive every test suite and code review are always in the state space, not in any single code path. Cache invalidation, distributed coordination, anything with concurrent writes. TLA+ or Lean as a targeted debugging tool for those specific pain points is genuinely practical now, especially with models handling the translation. You don't need to formally verify your whole codebase to get value from formal methods.