logoalt Hacker News

garaetjjteyesterday at 8:12 PM1 replyview on HN

You usually hope that TTD points to the culprit in such situations. But once I encountered single-byte corruption that didn't make any sense in TTD trace, there was good value at write and next read was garbage. I never discovered whether that was CPU bug, corruption by GPU shaders, stray kernel writes, or whatever.(I think it's unlikely that CPU bug would manifest with both native and TTD-instrumented runs. Corrupted byte was inside heap allocated memory so it shouldn't be in GPU pagetables at all. Kernel writes wouldn't appear in TTD trace, so really I think that was most likely issue, but how to debug that...)


Replies

nianderwallacetoday at 7:27 PM

For specific cases, you'd convert your memory allocator - hopefully you could reduce the need to just certain mem allocs - and write-protect (aka read-only memory) those mem allocs except for the situations where your code is purposely writing to those areas of memory.

Yes, it'll be slow and use p lots of memory pages, unless you can reduce the mem allocs to a certain small set of allocs. And you'll have to have code to write-enable/write-disable those mem allocs. But if it catches the culprit writing bytes where they shouldn't, it'll be worth it.

The one time I did this for a buffer passed to a HW device, I could prove that the hardware was doing DMA-writes where it shouldn't. Had to bring a HW logic analyzer to verify.