Interesting prediction. It sort of makes sense. I have noticed that LLMs are very good at solving pr...

arjie • yesterday at 10:22 PM • 2 replies • view on HN

Interesting prediction. It sort of makes sense. I have noticed that LLMs are very good at solving problems whose solutions are easy to check[0]. It ends up being quite an advantage to be able to work on such problems because rarely does an LLM truly one-shot a solution through token generation. Usually the multi-shot is 'hidden' in the reasoning tokens, or for my use-cases it's usually solved via the verification machine.

A formally verified system is easier for the model to check and consequently easier for it to program to. I suppose the question is whether or not formal methods are sufficiently tractable that they actually do help the LLM be able to finish the job before it runs out of its context.

Regardless, I often use coding assistants in that manner:

1. First, I use the assistant to come up with the success condition program

2. Then I use the assistant to solve the original problem by asking it to check with the success condition program

3. Then I check the solution myself

It's not rocket science, and is just the same approach we've always taken to problem-solving, but it is nice that modern tools can also work in this way. With this, I can usually use Opus or GPT-5.2 in unattended mode.

0: https://wiki.roshangeorge.dev/w/Blog/2025-12-11/LLMs_Excel_A...

Replies

degamad • yesterday at 11:55 PM

> 1. First, I use the assistant to come up with the success condition program

> 2. Then I use the assistant to solve the original problem by asking it to check with the success condition program

This sounds a lot like Test-Driven Development. :)

➕ show 1 reply

imiric • yesterday at 10:44 PM

The issue is that many problems aren't easy to verify, and LLMs also excel at producing garbage output that appears correct on the surface. There are fields of science where verification is a long and arduous process, even for content produced by humans. Throwing LLMs at these problems can only produce more work for a human to waste time verifying.

➕ show 1 reply

alt Hacker News

Replies