logoalt Hacker News

motbus3today at 1:47 PM1 replyview on HN

This is not unlikely. This is actually likely. The instructions for those agents is to find signals that prove there is an attack. Llms are steered to do what they are requested. They will interpret the signals a strongly as possible. They will omit counter evidence to achieve their objective. They will distort analysis to find their objective.

This has been everyone's llm problem daily. How is not that clear yet?


Replies

chuckadamstoday at 1:57 PM

I don't disagree, but just to play devils advocate: the LLM can also be told to look for counter-evidence, and will at least make a stab at doing so. That's more than we can expect from the humans currently in charge.