> This is not the first time we can see Nvidia taking shortcuts to achieve maximum performance of their GPUs
Why is implementing it correctly not performant? For context I have no idea how rounding is typically implemented anyways.