> Only a small portion of software is going to be able to be properly annotated to take advantage of this
The same could be said for, say, SIMD/vectorization, which 99% of ordinary application code has no use for, but it quietly provides big performance benefits whenever you resample an image, or use a media codec, or display 3D graphics, or run a small AI model on the CPU, etc. There are lots of performance microfeatures like this that may or may not be worth it to include in a system, but just because they are only useful in certain very specific cases does not mean they should be dismissed out of hand. Sometimes the juice is worth the squeeze (and sometimes not, but you can't know for sure unless you put it out into the world and see if people use it).
That's fair, I'm implicitly assuming the area cost for this dedicated memory would be much larger than that of e.g. SIMD vector banks.