logoalt Hacker News

doogliusyesterday at 9:35 PM3 repliesview on HN

I'm not seeing the case for adding this to general-purpose CPUs/software. Only a small portion of software is going to be able to be properly annotated to take advantage of this, so it'd be a pointless cost for the rest of users. Normally short-term access can easily become long-term in the tail the process gets preempted by something higher priority or spend a lot of time on an I/O operation. It's also not clear why if you had an efficient solution for the short-term case you wouldn't just add a refresh cycle and use it in place of normal SRAM as generic cache? These make a lot more sense in a dedicated hardware context -- like neural nets -- which I think is the authors' main target here.


Replies

laserbeamtoday at 4:04 AM

I imagine it would be straightforward to support this for codebases which already define multiple allocators (game engines, programs written in zig, bunch of other examples I’m less familiar with). If you’re already in memory management land you already you already have multiple implementations of malloc and free. Adding more of them is trivial.

If you’re not in manual memory management land, then you probably don’t care about this optimization just like you barely think of stack vs heap. Maybe the compiler could guess something for you, but I wouldn’t be worrying about it in that problem space.

show 1 reply
gizmo686today at 1:54 AM

A bunch of applications should be able to annotate data as read-heavy. Without any change of application code, operating systems can assume that pages mapped read-only should be mapped. This imidietly gives you the majority of executable code and all data files that are mmapped as read only.

I'm not sure how good applications are at properly annotating it, but for most applications assets are also effectively read only.

You don't even need most of the ram usage to be able to take advantage of this. If you can reasonably predict what portion of ram usage will be heuristically read-heavy, then you can allocate your ram budget accordingly, and probably eak out a measurable performance improvement. In a world with Moore's law, this type of heterogeneous architecture has proven to not really be worth it. However that calculus chagnes once we lose the ability to throw more transistors at the problem.

gary_0yesterday at 10:17 PM

> Only a small portion of software is going to be able to be properly annotated to take advantage of this

The same could be said for, say, SIMD/vectorization, which 99% of ordinary application code has no use for, but it quietly provides big performance benefits whenever you resample an image, or use a media codec, or display 3D graphics, or run a small AI model on the CPU, etc. There are lots of performance microfeatures like this that may or may not be worth it to include in a system, but just because they are only useful in certain very specific cases does not mean they should be dismissed out of hand. Sometimes the juice is worth the squeeze (and sometimes not, but you can't know for sure unless you put it out into the world and see if people use it).

show 1 reply