I prefer a different encoding: cycles/op
Both ops/sec and sec/op vary on clock rate, and clock rate varies across machines, and along the execution time of your program.
AFAIK, Cycles (a la _rdtsc) is as close as you can get to a stable performance measurement for an operation. You can compare it on chips with different clock rates and architectures, and derive meaningful insight. The same cannot be said for op/sec or sec/op.
Unfortunately, what you'll find if you dig into this is that cycles/op isn't as meaningful as you might imagine.
Most modern CPUs are out of order executors. That means that while a floating point operation might take 4 cycles to complete, if you put a bunch of other instructions around it like adds, divides, and multiplies, those will all finish at roughly the same time.
That makes it somewhat hard to reason about exactly how long any given set of operations will be. A FloatMul could take 4 cycles on it's own, and if you have
That can also take 4 cycles to finish. It's simply not as simple as saying "Let's add up the cycles for these 4 ops to get the total cycle count".Realistically, what you'll actually be waiting on is cache and main memory. This fact is so reliable that it underpins SMT. It's why most modern CPUs will do that in some form.