How in the world did they not hit the guardrails a single time while doing this while I can barely get it to do anything before the guardrails show up?
Like Volkswagen Dieselgate, perhaps it is configured to behave differently when being benchmarked?
idk, maybe they tested Opus and didn't realize it. I can't even get it to evaluate some code doing some mixed modeling work. Its strange to me.