AI will make formal verification go mainstream

800 points • by evankhoury • last Tuesday at 9:14 PM • 408 comments • view on HN

Comments

8organicbits • last Tuesday at 11:52 PM

If the engineer doesn't understand the proof system then they cannot validate that the proof describes their problem. The golden rule of LLMs is that they make mistakes and you need to check their output.

> writing proof scripts is one of the best applications for LLMs. It doesn’t matter if they hallucinate nonsense, because the proof checker will reject any invalid proof

Nonsense. If the AI hallucinated the proof script then it has no connection to the problem statement.

goalieca • yesterday at 12:50 AM

Really? Language models are fundamentally not equivalent in scale and scope to formal logical models. People are extrapolating poorly.

css_apologist • yesterday at 4:00 AM

i want this to be true, however i am not confident

brador • yesterday at 8:29 AM

Few care about software that much. Just look at the buggy messes of the last 10 years by every one of the top tech companies.

Large scale software is made cheap and fast, not good.

m00dy • yesterday at 5:02 AM

I just write in safe Rust, if it compiles then it is formally verified for me.

I recently used Rust in my recent project, Deepwalker [0]. I have written only once and never looked back.

[0]: https://deepwalker.xyz

positron26 • yesterday at 5:00 AM

The natural can create the formal. An extremely intuitive proof is that human walked to Greece and created new formalisms from pre-historic cultures that did not have them.

Gödel's incompleteness theorems are a formal argument that only the natural can create the formal (because no formal system can create all others).

Tarski's undefinability theorem gives us the idea that we need different languages for formalization and the formalisms themselves.

The Howard Curry correspondence concludes that the formalisms that pop out are indistinguishable from programs.

Altogether we can basically synthesize a proof that AGI means automatic formalization, which absolutely requires strong natural systems employed to create new formal systems.

I ended up staying with some family who were watching The Voice. XG performed Glam, and now that I have spit many other truths, may you discover the truth that motivates my work on swapchain resizing. I wish the world would not waste my time and their own, but bootstrapping is about using the merely sufficient to make the good.

victor22 • yesterday at 4:50 PM

meds

teleforce • last Tuesday at 11:35 PM

>writing those proofs is both very difficult (requiring PhD-level training) and very laborious.

>For example, as of 2009, the formally verified seL4 microkernel consisted of 8,700 lines of C code, but proving it correct required 20 person-years and 200,000 lines of Isabelle code – or 23 lines of proof and half a person-day for every single line of implementation. Moreover, there are maybe a few hundred people in the world (wild guess) who know how to write such proofs, since it requires a lot of arcane knowledge about the proof system.

I think this type of pattern (genuine difficult problem domain with very small number of experts) is the future of AI not AGI. For examples formal verification like this article and similarly automated ECG interpretation can be the AI killer applications, and the former is I'm currently working on.

For most of the countries in the world, only several hundreds to several thousands registered cardiologist per country, making the ratio about 1:100,000 cardiologist to population ratio.

People expecting cardiologist to go through their ECG readings but reading ECG is very cumbersome. Let's say you have 5 minutes ECG signals for the minimum requirement for AFib detection as per guideline. The standard ECG is 12-lead resulting in 12 x 5 x 60 = 3600 beats even for the minimum 5 minutes durations requirements (assuming 1 minute ECG equals to 60 beats).

Then of course we have Holter ECG with typical 24-hour readings that increase the duration considerably and that's why almost all Holter reading now is automated. But current ECG automated detection has very low accuracy because their accuracy of their detection methods (statistics/AI/ML) are bounded by the beat detection algorithm for example the venerable Pan-Tompkins for the limited fiducial time-domain approach [1].

The cardiologist will rather spent their time for more interesting activities like teaching future cardiologists, performing expensive procedures like ICD or pacemaker, or having their once in a blue moon holidays instead of reading monotonous patients' ECGs.

This is why ECG reading automation with AI/ML is necessary to complement the cardiologist but the trick is to increase the sensitivity part of the accuracy to very high value preferably 100% and we achieved this accuracy for both major heart anomalies namely arrhythmia (irregular heart beats) and ischemia (heart not regulating blood flow properly) by going with non-fiducial detection approach or beyond time domain with the help of statistics/ML/AI. Thus the missing of potential patients (false negative) is minimized for the expert and cardiologist in the loop exercise.

[1] Pan–Tompkins algorithm:

https://en.wikipedia.org/wiki/Pan%E2%80%93Tompkins_algorithm

bgwalter • last Tuesday at 10:28 PM

No it won't. People who aren't interested in writing specifications now won't be interested later as well. They want to hit the randomization button on their favorite "AI" jukebox.

If anyone does write a specification, the "AI" won't get even past the termination proof of a moderately complex function, which is the first step of accepting said function in the proof environment. Before you can even start the actual proof.

This article is pretty low on evidence, perhaps it is about getting funding by talking about "AI".

Analemma_ • last Tuesday at 10:09 PM

I don't really buy it. IMO, the biggest reason formal verification isn't used much in software development right now is that formal verification needs requirements to be fixed in stone, and in the real world requirements are changing constantly: from customer requests, from changes in libraries or external systems, and competitive market pressures. AI and vibe coding will probably accelerate this trend: when people know you can vibe code something, they will feel permitted to demand even faster changes (and your upstream libraries and external systems will change faster too), so formal verification will be less useful than ever.

justatdotin • yesterday at 2:16 AM

> 2. AI-generated code needs formal verification so that we can skip human review and still be sure that it works;

hahahahaha

henning • last Tuesday at 9:38 PM

Unless you feed a spec to the LLM, and it nitpicks compiled TLA+ output generated by your PlusCal input, gaslights you into saying the code you just ran and pasted the output of is invalid, then generates invalid TLA+ output in response. Which is exactly what happened when I tried coding with Gemini via formal verification.

tonyhart7 • yesterday at 7:25 AM

that just testing code noo ???

what makes it different other than called it "verification" ???

datatrashfire • yesterday at 12:34 AM

lol no it won’t

yuppiemephisto • last Tuesday at 10:08 PM

I vibe code extremely extensively with Lean 4, enough to run out 2 claude code $200 accounts api limits every day for a week.

I added LSP support for images to get better feedback loops and opus was able to debug https://github.com/alok/LeanPlot. The entire library was vibe coded by older ai.

It also wrote https://github.com/alok/hexluthor (a hex color syntax highlighting extension that uses lean’s metaprogramming and lsp to show you what color a hex literal is) by using feedback and me saying “keep goign” (yes i misspelled it).

It has serious issues with slop and the limitations of small data, but the rate of progress is really really fast. Opus 4.5 and Gemini were a huge step change.

The language is also improving very fast. not as fast as AI.

The feedback loop is very real even for ordinary programming. The model really resists it though because it’s super hard, but again this is rapidly improving.

I started vibe coding Lean about 3 years ago and I’ve used Lean 3 (which was far worse). It’s my favorite language after churning through idk 30?

A big aspect of being successful with them is not being all or nothing with proofs. It’s very useful to write down properties as executable code and then just not prove them because they still have to type check and fit together and make sense. github.com/lecopivo/scilean is a good example (search “sorry_proof”).

There’s property testing with “plausible” as a nice 80/20 that can be upgraded to full proof at some point.

When the model gets to another jump in capacity, I predict it will emergently design better systems from the feedback needed to prove that they are correct in the first place. Formal Verification has a tendency like optimization to flow through the system in an anti-modular way and if you want to claw modularity back, you have to design it really really well. But ai gives a huge intellectual overhang. Why not let them put their capacity towards making better systems?

Even the documentation system for lean (verso) is (dependently!) typed.

Check out my Lean vibe codes at https://github.com/alok?tab=repositories&q=Lean&type=&langua...

➕ show 1 reply

user3939382 • yesterday at 4:25 AM

I’ve made huge advances in formal verification with AI feel free to email if this is your field

PunchyHamster • last Tuesday at 10:54 PM

Prediction: Fuck no.

AI is unreliable as it is. It might make formal verification a bit less work intensive but the last possible place anyone would want the AI hallucinations are in verification.

➕ show 1 reply

waterTanuki • last Tuesday at 11:38 PM

I'm skeptical of formal verification mainly because it's akin to trying to predict the future with a sophisticated crystal ball. You can't formally guarantee hardware won't fail, or that solar radiation will flip a bit. What seems to have had much better ROI in terms of safety critical systems is switching to memory-safe languages that rely less on runtime promises and more on compiler guarantees.

dboreham • yesterday at 12:45 AM

0.02/wo: formal methods are just "fancy tests". Of course tests are good to have, and fancy tests even better. I've found that humans are pretty terrible on average at creating tests (perhaps it would be more accurate to say that their MBA overlords have not been great at supporting their testing endeavors). Meanwhile I've found that LLMs are pretty good at generating tests and don't have the human tendency to see that kind of activity as resume-limiting. Therefore even a not amazing LLM writing tests turns out to be a very useful resource if you have any interest in the quality of your product and mitigating various failure risks.

CyberDildonics • yesterday at 12:50 AM

When people have the choice whether to use AI or use rust because LLMs don't produce workable rust programs they leave rust behind and use something else. The venn diagram of people who want formal verification and think LLM slop is a good idea is two separate circles.

➕ show 1 reply

ath3nd • yesterday at 11:00 AM

[dead]

NedF • yesterday at 3:10 AM

[dead]

alt Hacker News

AI will make formal verification go mainstream

Comments