Past the sea change: half the reason those prompt and harness solutions seem to work are LLM-lies, t...

bonesss • today at 2:39 PM • 1 reply • view on HN

Past the sea change: half the reason those prompt and harness solutions seem to work are LLM-lies, the testing is gassing you about how it works and the efficacy, defaulting to ‘yes’.

If you test specific features of those solutions over time you see very inconsistent results, lots of lies, and seemingly stable solutions that one-shot well but suddenly experience behaviour changes due to tweaks on the backend. Tuesdays awesome agent stack that finally works is loading totally different on Thursday, and debugging is “oh, sorry, it’s better now” even when it isn’t. Compression, lies, and external hosting are a bad combo.

Sometimes I imagine a world where computers executed programs the same way each time. You could write some code once and run it a whole calendar month later with a predictable outcome. What a dream, we can hope I guess.

Replies

skydhash • today at 3:43 PM

People are doing toy projects and praising them, while some are testing them in real world situations and not findings them that useful. But the former is labelling the latter as luddites and telling them they will be left behind.

➕ show 1 reply

alt Hacker News

Replies