They had a 56 hour "quality degradation" event last week but things seem to be back to normal now. Been running it all day and getting great results again.
I know that's anecdotal but anecdotes are basically all we have with these things
If I am bitching at Claude, then something is wrong. Something was wrong. It broke its deixis and frobnobulated its implied referents.
I briefly thought of canning a bunch of tasks as an eval so I could know quantitatively if the thing was off the rails. But I just stopped for awhile and it got better.
... and I totally agree: anecdotes are all we have indeed.
Oh I wasn't aware of that. I will try it again. Thank you for letting me know!