> The engineer is forced into manual correlation: jumping between dashboards, aligning timelines by eye, [and] inferring causality from coincidence
I just generate a random UUID in the application and make sure to log it everywhere across the entire stack along with a timestamp.
Any old log aggregator can give me an accurate timeline grouped by request UUID across every backend component all in one dashboard.
It's the very first thing that I have the application do when handling a request. It's injected it at the log handler level. There's nothing to break and nothing to think about.
So, I have no problem knowing precise cause and effect with regard to all logs for a given isolated request, but I agree that there may be blips that affect multiple requests (outages, etc.). We have synthetic tests for outages though.
I too am struggling to understand what this tool does beyond grouping all logs by a unique request identifier.
If you use OpenTelemetry, it basically does exactly that and you can send traces to some self-hosted FOSS visualizer, like Jaeger. You can also easily get the UUID of the spans/traces and have your logger automatically put them in every log message.
founder at base14 here, the company that is building Scout. Thanks for the feedback. we do something similar for tracing as well, but pgX does a bit more than that - engineers should be able to trace (like you mention) and see and analyse the condition of the DB. for eg - correlate query slowdown to locks, vacuums etc. all on one screen, or a couple of clicks. We are building some specialised explorers like pgX for postgres. Essentially we are building telemetry readers for components that send relevant metrics and logs through to a telemetry data lake. for each component/domain we find from experts what they look at for analysis and incidents, and bring that to a full stack "unified" dashboards/mcp.
Scout is our otel-native observability product (data lake, UI, alerts, analytics, mcp, the works). what we call pgX in the blog is an add-on to Scout.