What are you talking about in a hot loop in my software renderer this is like 10x faster

lacedeconstruct • yesterday at 7:51 PM • 2 replies • view on HN

    // color4_t result = {
    //     .r = (src.r * src.a + dst.r * inv_alpha) * INV_255,
    //     .g = (src.g * src.a + dst.g * inv_alpha) * INV_255,
    //     .b = (src.b * src.a + dst.b * inv_alpha) * INV_255,
    //     .a = src.a + (dst.a * inv_alpha) * INV_255
    // };

    // 1/256 but much faster
    color4_t result = {
        .r = (src.r * src.a + dst.r * inv_alpha) >> 8,
        .g = (src.g * src.a + dst.g * inv_alpha) >> 8,
        .b = (src.b * src.a + dst.b * inv_alpha) >> 8,
        .a = src.a + ((dst.a * inv_alpha) >> 8)
    };

Replies

Tuna-Fish • yesterday at 8:53 PM

If the latter is 10x faster, the issue is some kind of weird compilation failure for the above version. For one, it only cuts a third of the multiplies.

dist-epoch • yesterday at 7:51 PM

Because you are working in the cache.

Also, you should use SIMD.

➕ show 1 reply

alt Hacker News

Replies