Interesting article. I tend to use
- i = min(floor(f * 256), 255) (from float to uint8)
- f = i / 255 (from uint8 to float)
Basically a mix of the 2 approaches mentioned in the article.
For all integers between [0,255], if I do uint8 -> float -> uint8 conversion, I will get the same result.
--
edit: I wondered what's the maximum jitter amount that I can introduce to the float and get the same uint8 value. And also these 0->0.0 and 255->1.0 should map properly.
With my approach at the top, the jitter margin that I can introduce is 1/65280.
But with the article's approach
- i = floor(f * 255 + 0.5)
- f = i / 255
maximum jitter margin is 1/510 (which is better).
This is what I do for the former:
floor( nextafter( 256, 255 ) * value )
It's worth pointing out that the article explicitly calls out your first mixed technique:
> Finally, one should never mix the encode and decode steps of the two quantizers. That’s just broken code. It’s an easy mistake to make, though.