False positives like this are probably more damaging than the guardrails themselves. If engineers ca...

kraakf06 • today at 1:23 AM • 1 reply • view on HN

False positives like this are probably more damaging than the guardrails themselves. If engineers can't predict when a model will switch behavior, it becomes difficult to trust it in production workflows.

Replies

catlifeonmars • today at 3:33 AM

> “trust it in production workflows”

What degree of predictability is required? I imagine the bar is pretty low if you trust the previous models in the same contexts.

alt Hacker News

Replies