logoalt Hacker News

renewiltordtoday at 4:00 AM2 repliesview on HN

Opus 4.6 is a very good model but harness around it is good too. It can talk about sensitive subjects without getting guardrail-whacked.

This is much more reliable than ChatGPT guardrail which has a random element with same prompt. Perhaps leakage from improperly cleared context from other request in queue or maybe A/B test on guardrail but I have sometimes had it trigger on innocuous request like GDP retrieval and summary with bucketing.


Replies

menzoictoday at 4:23 AM

I would think it’s due to the non determinism. Leaking context would be an unacceptable flaw since many users rely on the same instance.

A/B test is plausible but unlikely since that is typically for testing user behavior. For testing model output you can do that with offline evaluations.

show 1 reply
tbossanovatoday at 4:18 AM

What kind of value do you get from talking to it about “sensitive” subjects? Speaking as someone who doesn’t use AI, so I don’t really understand what kind of conversation you’re talking about

show 4 replies