We've been experimenting with combining durable execution with debugging tasks, and it's working incredibly well! With the added context of actual execution data, defined by the developer as to which functions are important (instead of individual calls), it give the LLM the data it needs.
I know there are AI SRE companies that have discovered the same -- that you can't just throw a bunch of data at a regular LLM and have it "do SRE things". It needs more structured context, and their value add is knowing what context and what structure is necessary.