Post author here.
Yes, it works really well.
1) The latest models are radically better at this. We noticed a massive improvement in quality starting with Sonnet 4.5
2) The context issue is real. We solve this by using sub agents that read through logs and return only relevant bits to the parent agent’s context
I would be very interested in reading about this kind of orchestration and filtering than data acquisition if you have the energy for another post :)
[dead]
So you’re not getting alerts at 2 am from hallucinations?