logoalt Hacker News

fertoday at 1:41 PM4 repliesview on HN

If you tell it to. Otherwise you might get the classic "You're absolutely right – I made that up. Let me look at the documentation"


Replies

freedombentoday at 2:01 PM

But you can tell it to once (in CLAUDE.md for example) and it will nearly every time (it's getting much better at that). Since opus 4.7 (which I consider a downgrade overall) it's been much better at following CLAUDE.md . I even have an intentional contradiction in my user-level CLAUDE.md and the project levels, so I can tell which one is taking precedent or if both are disregarded, and it follows at least one of them most of the time, and it follows the local one 95% of the time.

ben_wtoday at 2:24 PM

While they absolutely do fail as you say (though in my experience not by default), this failure mode is still a massive improvement over the frequent human case of guessing based on the function/class/property/argument names.

Now, a really good human collaborator who reads all the stuff and thinks carefully, that was still better than what I saw from AI models at the start of this year. But I've also worked with my share of idiots, and been one too.

I'm not going to get into if *current* models can or can't reliably do any particular thing to any particular standard; previously my comparison was the same conversations with regard to video game computer graphics in the 90s always being "photorealistic" when they really weren't*; now, I'm starting to feel such discussions have the same vibes as Tesla fans insisting that "FSD-{insert current version here} solves all the problems and is a real breakthrough and the Rototaxi will totes conquer the marketplace this time for real bro, just one more version bro", etc.

* https://archive.org/details/nextgen-issue-26

serftoday at 3:11 PM

if you find yourself saying 'if you tell it to' a lot about LLMs that usually just says something about your prompting methods.

or, in other words , if you want the thing to always read the documentation then make that a strongly highlighted point both in pre-prompts, active prompts, and memory.

show 1 reply
moregristtoday at 2:51 PM

Sometimes you get lucky and it both looks up the documentation and then ignores it and makes stuff up.