logoalt Hacker News

hypronlast Saturday at 3:47 PM3 repliesview on HN

My issue with this is that the LLM could just be roleplaying that it doesn't know.


Replies

jdifflast Saturday at 4:07 PM

Of course it is. It's not capable of actually forgetting or suppressing its training data. It's just double checking rather than assuming because of the prompt. Roleplaying is exactly what it's doing. At any point, it may stop doing that and spit out an answer solely based on training data.

It's a big part of why search overview summaries are so awful. Many times the answers are not grounded in the material.

show 1 reply
stavrosyesterday at 12:56 AM

Doesn't know what? This isn't about the model forgetting the training data, of course it can't do that any more than I can say "press the red button. Actually, forget that, press whatever you want" and have you actually forget what I said.

Instead, what can happen is that, like a human, the model (hopefully) disregards the instruction, making it carry (close to) zero weight.

brianwawoklast Saturday at 4:03 PM

To test would just need to edit the rom and switch around the solution. Not sure how complicated that is, likely depends on the rom system.

show 1 reply