Only in micro-benchmarks.
For real usage, today's CPUs are limited by memory bandwidth.
What are you talking about in a hot loop in my software renderer this is like 10x faster
// color4_t result = { // .r = (src.r * src.a + dst.r * inv_alpha) * INV_255, // .g = (src.g * src.a + dst.g * inv_alpha) * INV_255, // .b = (src.b * src.a + dst.b * inv_alpha) * INV_255, // .a = src.a + (dst.a * inv_alpha) * INV_255 // }; // 1/256 but much faster color4_t result = { .r = (src.r * src.a + dst.r * inv_alpha) >> 8, .g = (src.g * src.a + dst.g * inv_alpha) >> 8, .b = (src.b * src.a + dst.b * inv_alpha) >> 8, .a = src.a + ((dst.a * inv_alpha) >> 8) };
[dead]
What are you talking about in a hot loop in my software renderer this is like 10x faster