CPUs are surprisingly good at dealing with this in their store queues. I see this write-all-and-incr...

ack_complete • 01/18/2025 • 1 reply • view on HN

CPUs are surprisingly good at dealing with this in their store queues. I see this write-all-and-increment-some technique used a lot in optimized code, like branchless left-pack routines or overcopying in the copy handler of an LZ/Deflate decompressor.

Replies

atq2119 • 01/18/2025

Yep, same with overlapping unaligned loads. It's just fairly cheap to make that stuff pipelined and run fast. It's only when you mix loads and stores in the same memory region that there are conflicts that can slow you down (and then quite horribly actually, depending on the exact processor).

➕ show 1 reply

alt Hacker News

Replies