Mostly the latter, but a lot of tools are so slow the former actually becomes a problem, too. Valgrind is a great example. For realtime applications, Valgrind and friends are pretty much a non-starter.
To your point about profile results, any profiler that adds more than a couple percentage points at runtime basically destroys the profile results (less than one percent is the acceptable margin, for me). Adding 50% is just laughable, and at the time I looked that was the best available option.
At the time, I was trying to profile ML models and some tooling surrounding them. There are several reasons you want your profiler to be low overhead:
1. If the profiler does a ton of useless shit (read: is slow), it has many deleterious effects on the program being profiled. It can evict entries from the CPUs ICache, DCache, and TLB entries to name a few, all of which can cause huge stalls and make something (such as a scan over tensor memory, for example) many times slower than it would be otherwise. You become unable to reason about if something is taking a long time because it's doing something stupid, or if the profiler is just screwing up your day. Introducing this kind of noise to your profile makes it nearly impossible to do a good job at analysis, let alone optimizing anything.
2. Somewhat unrelated to performance, but, you really want to know more than "this function take up a lot of time", which is basically all sampling profilers tell you. If you look at a flame graph and it says "fooFunc()" takes up 80% of the time, you have no idea if that's because one call to "fooFunc()" took 79% and the rest were negligible, or if they're all slow, or just a handful. That is key information and, in my mind, basically makes sampling profilers unsuitable for anything but 'approxamizing'. Which can be useful, and is often good enough, but if you need to optimize something for real, a sampling profiler exhausts it's usefulness pretty quick.
Anyways .. there are some random thoughts for you :)