AI will make formal verification go mainstream

786 points • by evankhoury • yesterday at 9:14 PM • 395 comments • view on HN

Comments

Formal verification does not scale, and has not scaled for 2 decades. Although, LLMs can help write properties, that required a skilled person (Phd) to write.

In pre-silicon verification, formal has been used successfully for decades, but is not a replacement for simulation based verification.

The future of verification (for hardware and software) is to eliminate verification all together, by synthesizing intent into correct code and tests.

-- https//www.verifai.ai

➕ show 1 reply

lewisjoe • yesterday at 11:31 PM

A while back I did this experiment where I asked ChatGPT to imagine a new language such that it's best for AI to write code in and to list the properties of such a language. Interestingly it spit out all the properties that are similar to today's functional, strongly typed, in fact dependently typed, formally verifiable / proof languages. Along with reasons why such a language is easier for AI to reason about and generate programs. I found it fascinating because I expected something like typescript or kotlin.

While I agree formal verification itself has its problems, I think the argument has merit because soon AI generated code will surpass all human generated code and when that happens we atleast need a way to verify the code can be proved that it won't have security issues or adheres to compliance / policy.

➕ show 1 reply

chamomeal • yesterday at 11:02 PM

Formal methods are very cool (and I know nothing about them lol) but there's still a gap between the proof and the implementation, unless you're using a language with proof-checking built in.

If we're looking to use LLMs to make code absolutely rock-solid, I would say advanced testing practices are a good candidate!. Property-based testing, fuzzing, contract testing (for example https://github.com/griffinbank/test.contract) are all fun but extremely tedious to write and maintain. I think that makes it the perfect candidate for LLMs. These kinds of tests are also more easily understandable by regular ol' software developers, and I think we'll have to be auditing LLM output for quite a while.

➕ show 1 reply

inimino • today at 4:11 AM

I both agree and disagree. Yes, AI will democratize access to formal methods and will probably increase the adoption of them in areas where they make sense (e.g. safety-critical systems), but no, it won't increase the areas where formal methods are appropriate (probably < 1% of software).

What will happen instead is a more general application of AI systems to verifying software correctness, which should lead to more reliable software. The bottleneck in software quality is in specifying what the behavior needs to be, not in validating conformance to a known specification.

tasuki • today at 6:17 AM

I'd love it, but this is almost textbook wishful thinking.

bob1029 • yesterday at 11:43 PM

I don't know if it's "formal", but I think combining Roslyn analyzers with an LLM and a human in the loop could be game changing. The analyzers can enforce intent over time without any regressions. The LLM can write and modify analyzers. Analyzers can constrain the source code of other analyzers. Hypothetically there is no limit to the # of things that could be encoded this way. I am currently working on a few prototypes in the vicinity of this.

https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/t...

whazor • yesterday at 10:03 PM

Domain Specific Languages + Formal Verification.

With a domain specific language you can add extra limitations on the kinds of outputs. This can also make formal verification faster.

Maybe like React components. You limit the format. You could limit which libraries can be imported, what hooks could be used, how expressive could be.

throwaway613745 • yesterday at 10:07 PM

GPT 5.2 can't even tell me how many rs are in garlic.

➕ show 3 replies

gttalbot • today at 12:15 PM

If the AI is going to lie and cheat on the code it writes (this is the largest thing I bump into regularly and have to nudge it on), what makes the author think that the same AI won't cheat on the proof too?

Fairburn • today at 6:21 PM

AI is great; it makes my job easier by removing repetitive processes, freeing me up to tackle other "stuff". I like that I can have it form up json/xml scaffolding, etc. All the insanely boring stuff that can often get corrupted by human error.

AI isn't the solution to move humanity forward in any meaningful way. It is just another aspect of our ability to offload labor. Which in and of itself is fine. Until of course it gets weaponized and is used to remove the human aspect from more and more everyday things we take for granted.

It is being used to accelerate the divisions among people. Further separating the human from humanness. I love 'AI' tools, they make my life easier, work-wise. But would I miss it if it wasn't there? No. I did just fine before and would do so after it's flame and innovation slowly peters out.

zkmon • today at 7:26 AM

Any proof system or method is relative to its frame of reference, axiomatic context, known proofs etc. Modern software doesn't live in an isolated lab context, unless you are building an air-gapped HSM etc. Proof system itself would have to evolve to communicate the changing specs to the underlying software.

So, the job is not done for humans yet.

hedayet • today at 3:12 AM

I can see the inspiration; But then again, how much investment will be required to verify the verifier? (it's still code - and is generated my a non-deterministic system)

lawrpaulson • today at 10:24 AM

Formal verification is progressing inexorably and will, over time, transform the software development experience. There is a long way to go, and in particular, you can't even get started until basic interfaces have been specified formally. This will be a bottom-up process where larger and larger components gradually get specified and verified. These will mostly be algorithmic or infrastructure building blocks, as opposed to application software subject to changing and vague requirements. I don't think LLMs will be contributing much to the verification process soon: there is nowhere near enough material available.

zamadatix • today at 1:22 AM

Some past discussion (excluding 0 comment submissions) to add:

https://news.ycombinator.com/item?id=46216274 - 4 comments, 6 days ago

https://news.ycombinator.com/item?id=46203508 - 1 comment, 7 days ago

https://news.ycombinator.com/item?id=46198874 - 2 comments, 8 days ago

mkleczek • today at 5:22 AM

The article only discusses reasons why formal verification is needed. It does not provide any information on how would AI solve the fundamental issues making it difficult: https://pron.github.io/posts/correctness-and-complexity

grahar64 • today at 10:35 AM

Won't people just use AI to define specification? Like if they are getting most of it done with AI, won't they won't read/test the code to verify won't they also not read the spec

dkga • today at 4:32 AM

As an economist I completely agree with the post. In fact, I assume we will see an explosion of formal proof requirements in code format by journals in a few years time. Not to mention this would make it much easier and faster to publish papers in Economic Theory, where checking the statements and sometimes the proofs simply takes a long time from reviewers.

skeeter2020 • yesterday at 11:06 PM

Prediction: AI hypers - both those who are clueless and those who know perfectly well - will love this because it makes their "AI replaces every developer" wet dream come true, by shifting the heavy lifting from this thing called "software development" to the tiny problem of just formally verifying the software product. Your average company can bury this close to QA, where they're probably already skimping, save a bunch of money and get out with the rewards before the full damage is apparent.

bluerooibos • today at 12:32 AM

Is there a difference here between a proof and an automated test?

I'm curious what a proof would look like compared to my RSpec unit tests and what the actual implementation would look like.

➕ show 1 reply

jkaptur • yesterday at 11:18 PM

I think that there are three relevant artifacts: the code, the specification, and the proof.

I agree with the author that if you have the code (and, with an LLM, you do) and a specification, AI agents could be helpful to generate the proof. This is a huge win!

But it certainly doesn't confront the important problem of writing a spec that captures the properties you actually care about. If the LLM writes that for you, I don't see a reason to trust that any more than you trust anything else it writes.

I'm not an expert here, so I invite correction.

LurkandComment • today at 4:46 PM

Open-banking will make formal verification go mainstream.

okaleniuk • today at 7:31 AM

AI will definitely make more people thing about verification. Formal or otherwise.

Sevii • yesterday at 9:46 PM

If AI is good enough to write formal verification, why wouldn't it be good enough to do QA? Why not just have AI do a full manual test sweep after every change?

➕ show 2 replies

ursAxZA • today at 3:20 AM

The theory is sound, but the practical bridge is missing.

There’s a gap that current LLM architectures simply can’t cross yet.

robot-wrangler • yesterday at 10:55 PM

I don't like sharing unpolished WIP and my project is still more at a lab-notebook phase than anything else, but the think-pieces are always light on code and maybe someone is looking for a fun holiday hackathon: https://mattvonrocketstein.github.io/py-mcmas/

topazas • today at 6:57 AM

Right, but let's assume we have really simple crud application in mainstream language, could be ts or python. What are current best approaches to have this semi-formally verified, tools, methods, etc?

raincole • today at 6:28 AM

God I hope not. Many people (me included) find Rust types hard to read enough. If formal verification ever goes mainstream I guarantee you that means no one reads them and the whole SWE completely depends on AI.

gaogao • yesterday at 11:26 PM

Topical to my interests, I used Claude Code the other day for formally verifying some matrix multiplication in Rust. Writing the spec wasn't too hard actually, done as post-conditions in code, as proving equivalence to a simpler version of the code, such as for optimization, is pretty straight forward. Maybe I should write up a proper post on it.

chhxdjsj • today at 5:47 AM

An excellent use case for this is ethereum smart contract verification. People store millions of dollars in smart contracts that are probably a one or two iterations of claude or gemini away from being pwned.

ktimespi • today at 7:01 AM

Can you really rely on an LLM to write valid proofs though? What if one of the assumptions is false? I can very well think of a lot of ways that this can happen in Rocq, for example.

jstummbillig • yesterday at 9:58 PM

Interesting. Here is my ai-powered dev prediction: We'll move toward event-sourced systems, because AI will be able to discover patterns and workflow correlations that are hard or impossible to recover from state-only CRUD. It seems silly to not preserve all that business information, given this analysis machine we have at our hands.

rf15 • today at 5:08 AM

Isn't formal verification a "just write it twice" approach with different languages? (and different logical constraints on the way you write the languages)

tachim • today at 12:42 AM

I'm Tudor, CEO of Harmonic. We're huge believers in formal verification, and started Harmonic in 2023 to make this technology mainstream. We built a system that achieved gold medal performance at 2025 IMO (https://harmonic.fun/news), and recently opened a public API that got 10/12 problems on this year's Putnam exam (https://aristotle.harmonic.fun/).

If you also think this is the future of programming and are interested in building it, please consider joining us: https://jobs.ashbyhq.com/Harmonic. We have incredibly interesting challenges across the stack, from infra to AI to Lean.

wagwang • yesterday at 10:33 PM

Disagree, the ideal agentic coding workflow for high tolerance programming is to give the agent access to software output and have it iterate in a loop. You can let the agent do TDD, you can give it access to server logs, or even browser access.

Meneth • today at 9:48 AM

There are bugs in the specification, so there are bugs in the verfication as well.

jappgar • today at 12:28 PM

There's really no such thing as complete verification.

The quest for purity is some fountain of youth nonsense that distracts a lot of otherwise brilliant engineers.

Ask the AI to make a program that consumes a program and determine if it halts.

0xWTF • today at 5:41 AM

And this is the year of the Linux desktop...

popcorncowboy • today at 8:30 AM

I will respectfully take the other side of this bet on polymarket if it ever shows up.

Havoc • yesterday at 10:28 PM

I've been toying with vibecoding rust - hardly formal verification, but it is a step closer than python that's for sure.

So far so good, though the smaller amount of training data is noticeable.

➕ show 1 reply

password-app • today at 1:02 AM

Formal verification for AI is fascinating, but the real challenge is runtime verification of AI agents.

Example: AI browser agents can be exploited via prompt injection (even Google's new "User Alignment Critic" only catches 90% of attacks).

For password management, we solved this with zero-knowledge architecture - the AI navigates websites but never sees credentials. Credentials stay in local Keychain, AI just clicks buttons.

Formal verification would be amazing for proving these isolation guarantees. Has anyone worked on verifying AI agent sandboxes?

p1necone • yesterday at 10:21 PM

Now: AI generates incorrect code.

Future: AI generates incorrect code, and formal verification that proves that the code performs that incorrect behaviour.

cess11 • today at 5:27 PM

"As the verification process itself becomes automated, the challenge will move to correctly defining the specification"

OK.

"Reading and writing such formal specifications still requires expertise and careful thought."

So the promise here is that bosses in IT organisations will in the future have expertise and prioritise careful thought, or allow their subordinates to have and practice these characteristics.

Would be great news if that'll be the case.

ozgrakkurt • today at 5:44 AM

You can’t even compile regular code with satisfying speed

tptacek • yesterday at 9:57 PM

Called this a little bit: https://fly.io/blog/semgrep-but-for-real-now/

Semgrep isn't a formal methods tool, but it's in the same space of rigor-improving tooling that sound great but in practice are painful to consistently use.

0xbadcafebee • yesterday at 11:28 PM

Unlikely. It's more work than necessary and systemic/cultural change follows the path of least resistance. Formal verification is a new thing a programmer would have to learn, so nobody will want to do it. They'll just accept the resulting problems as "the cost of AI".

Or the alternative will happen: people will stop using AI for programming. It's not actually better than hiring a person, it's just supposedly cheaper (in that you can reduce staff). That's the theory anyway. Yes there will be anecdotes from a few people about how they saved a million dollars in 2 days or something like that, the usual HN clickbait. But an actual study of the impact of using AI for programming will probably find it's only a marginal cost savings and isn't significantly faster.

And this is assuming the gravy train that programmers are using (unsustainable spending/building in unreasonable timeframes for uncertain profits) keeps going indefinitely. Best case, when it all falls apart the govt bails them out. Worst case, you won't have to worry about programming because we'll all be out of work from the AI bust recession.

shevy-java • today at 2:08 AM

No - I don't want AI to be "mainstream".

heliumtera • yesterday at 10:30 PM

Will formal verification be a viable and profitable avenue for the middle man to exploit and fuck everybody else? Then yes, it absolutely will become mainstream. If not, them no, thanks. Everything that becomes mainstream for SURE will empower the middleman and cuck the developer, nothing else really matters. This is literally the only important factor.

michalbilinski • today at 6:35 AM

i agree, i wrote a z3 skills for claude and generate a proof before writing code

kerameikos34 • today at 2:45 AM

This is utter nonsense. LLMs are dumb. Formal Methods require actual thinking. Specs are 100% reasoning.

I highly dislike the general tone of the article. Formal Methods are not fringe, they are used all around the world by good teams building reliable systems. The fact they are not mainstream have more to do with the poor ergonomics of the old tools and the corporate greed that got rid of the design activities in the software development process to instead bring about the era of agile cowboy coding. They did this just because they wanted to churn out products quickly at the expense of quality. It was never the correct civilized way of working and never will be.

alt Hacker News

AI will make formal verification go mainstream

Comments

🔗 View 23 more comments