logoalt Hacker News

niyikizatoday at 12:51 AM1 replyview on HN

>>harnesses should have more assertive layers of control and constraint

Been saying this for a while and mostly getting blank stares. In-context "controls" as the primary safety mechanism is going to be a bitter lesson for our industry. What you want is a deterministic check outside the model's reasoning that decides allow/deny without consulting its opinion. Cryptographic if the record needs to survive a compromised orchestrator, and open source. If your control is a string the model can read, the model can ignore it. If it can write it, it can forge it. I'm surprised how strange that idea sounds to some people.

Disclosure: I'm working on an open source authorization tool for agents.


Replies

steve_adams_86today at 1:54 AM

> I'm surprised how strange that idea sounds to some people.

I think a lot of people using the models genuinely feel like the models are more capable than they are now, and they're content to relinquish a lot of trust and agency. The worrying thing is that the models are superficially hyper-capable, but from more granular perspectives, you can see a lot of holes in their abilities. This is incredibly important, but very difficult to convey concisely to people. It's a classic example of nuance seeming too complicated because not caring is so much more gratifying. People love using these models.

show 1 reply