SRE agents are the worst agents. I totally get why business and management will demand them and love them. After all, they are the n+1 of customer support chat bot that you get frustrated talking to before you find the magic way to get to a person.
We have been using few different SRE agents and they all fucking suck. The way they are promoted and run always makes them eager to “please” by inventing processes, services, and work-arounds that don’t exist or make no sense. Giving examples will always sound pity or “dumb”. Every time I have to explain to management where SRE agent failed they just hand wave it and assume it’s a small problem. And the problem is, I totally get it. When the SRE agent says “DNS propagation issues are common. I recommend flushing dns cache or trying again later” or “The edge proxy held a bad cache entry. Cache will eventually get purged and the issue should be solved eventually” sounds so reasonable and “smart”. The issue was in DNS or in the proxy configuration. How smart was the SRE agent to get there? They think it’s phenomenal and it may be. But I know that the “DNS issue” isn’t gonna resolve itself because we have a bug in how we update DNS. I know the edge proxy cache issue is always gonna cause a particular use case to fail because the way cache invalidation is implemented has a bug. Everyone loves deflection (including me) and “self correcting” systems. But it just means that a certain class of bugs will forever be “fine” and maybe that’s fine. I don’t know anymore.
They suck in all your data. Then charge you minibar prices to access it.
Wow, BitsAI in Datadog isn’t even good. I didn’t realize Datadog considered it a genuine product offering rather than a mere gimmick.
If you're looking for somewhere to pipe your logs, Axiom's been great and very cheap.
Seems a bit too perfect that the AI SRE gets unfairly blocked
Edit: a couple of comments pointed out that the blog does mention paying Datadog. Leaving my comment as is below, because I still find the whole interaction weird. It makes me wonder if the story is fabricated.
> we lost visibility into production systems that depend fundamentally on continuous observability signals to operate safely.
The Datadog message implies that Deductive wasn't paying for any service from Datadog: "We've noticed you're actively evaluating Datadog" and "our Master Subscription Agreement that you accepted by using our service".
And Deductive apparently did this from Feb to Dec 2025. Quite a long time for a free evaluation, but perhaps they were just using the very limited free tier?
It's a little strange to be relying on a free tier or evaluation for "production systems that depend fundamentally on continuous observability". Presumably it couldn't have been that important to Deductive, otherwise they would have paid for the service they were "depending fundamentally" on.
This was clearly written by a bot heavily trained on linkedin posts, or someone horrifically addicted to linkedin. Its nauseating to read.
Logging, tracing, observability, and control plane (flags, etc.) should be open.
We built 100% in-house pieces for all of this at a major fintech a decade ago. Everything worked and single teams could manage these systems.
Someone in leadership said we had to get rid of all "weirdware". Open solutions weren't robust, so we went commerical.
SignalFX got acquired, immediately 10x'd our prices and put all hands on deck to migrate. Unscheduled, stressful, bullshit. We missed the migration date and had to pay anyway.
LaunchDarkly promised us the moon to replace the system my team built. It didn't work with Ruby, Go, and the Java client sucked. It couldn't sync online changes at runtime like our five nines distributed and fault tolerant system could. We had to upstream a ton of code. And their system still sucked by the time I left the project.
These systems need to be open and owned by us. Managed is okay, but they shouldn't be proprietary offerings.
I could extend that one step further to cloud itself, but that's an argument for another day.
Big fan of signoz - otel, self-hosted, Prometheus based, works with grafana, scales.
https://www.signoz.io
Datadog is good, sentry too, but after running a cloud practice for a major world business, I prefer to have my sensitive system logs and traces in house.