There are some days where it acts staggeringly bad, beyond baselines. But it’s impossible to actua...

data-ottawa • yesterday at 6:36 PM • 0 replies • view on HN

There are some days where it acts staggeringly bad, beyond baselines.

But it’s impossible to actually determine if it’s model variance, polluted context (if I scold it, is it now closer in latent space to a bad worker, and performs worse?), system prompt and tool changes, fine tunes and AB tests, variances in top P selection…

There’s too many variables and no hard evidence shared by Anthropic.

alt Hacker News