How would you handle the case where you want to trace 100% of errors? Presumably you don't kno...

mamidon • 06/16/2025 • 2 replies • view on HN

How would you handle the case where you want to trace 100% of errors? Presumably you don't know a trace is an error until after you've executed the thing and paid the price.

Replies

phillipcarter • 06/16/2025

This is correct. It's a seemingly simple desire -- "always capture whenever there's a request with an error!" -- but the overhead needed to set that up gets complex. And then you start heading down the path of "well THESE business conditions are more important than THOSE business conditions!" and before you know it, you've got a nice little tower of sampling cards assembled. It's still worth it, just a hefty tax at times, and often the right solution is to just pay for more compute and data so that your engineers are spending less time on these meta-level concerns.

jeffbee • 06/16/2025

I wouldn't. "Trace contains an error" is a hideously bad criterion for sampling. If you have some storage subsystem where you always hedge/race reads to two replicas then cancel the request of the losing replica, then all of your traces will contain an error. It is a genuinely terrible feature.

Local logging of error conditions is the way to go. And I mean local, not to a central, indexed log search engine; that's also way too expensive.

➕ show 1 reply

alt Hacker News

Replies