I'm a bit jealous. I would like to experiment with having a similar setup, but 10x Opus 4.5 running practically non stop must amount to a very high inference bill. Is it really worth the output?
From experimentation, I need to coach the models quite closely in order to get enough value. Letting it loose only works when I've given very specific instructions. But I'm using Codex and Clai, perhaps Claude code is better.
I've tried running a number of claude's in paralell on a CRUD full stack JS app. Yes, it got features made faster, yes it definitely did not leave me enough time to acutally look at what they did, yes it definitely produced sub-par code.
At the moment with one claude + manually fixing crap it produces I am faster at solving "easier" features (Think add API endpoint, re-build API client, implement frontend logic for API endpoint + UI) faster than if I write it myself.
Things that are more logic dense, it tends to produce so many errors that it's faster to solve myself.
Yeah, doesn't this guy work for Anthropic? He'd get to use 10x Opus 4.5 for free.
I have a coworker who is basically doing this right now he leads our team and is second place overall. Regularly runs opus in parallel he alone is burning through 1k worth of credits a day.
He is also one of our worst performers.