That's fair, I'm implicitly assuming the area cost for this dedicated memory would be much larger than that of e.g. SIMD vector banks.
The existence of SIMD has knock-on effects on the design of the execution unit and the FPUs, though, since it's usually the only way to fully utilize them for float/arithmetic workloads. And newer SIMD features like AVX/AVX2 have a pretty big effect on the whole CPU design; it was widely reported that Intel and AMD went to a lot of trouble to make it viable, even though most software probably isn't even compiled with AVX support enabled.
Also SIMD is just one example. Modern DMA controllers are probably another good example but I know less about them (although I did try some weird things with the one in the Raspberry Pi). Or niche OS features like shared memory--pipes are usually all you need, and don't break the multitasking paradigm, but in the few cases where shared memory is needed it speeds things up tremendously.
Presumably, the cost would be roughly the cost of traditional memory. In most consumer devices, memory is bottlenecked by monetary cost, not space or thermal constraints.
However, dedicate read-optimized memory would be instead of a comparable amount of general purpose memory, as data stored in one need not be stored in the other. The only increase in memory used would be what is necessary to account for fragmentation overhead when your actual usage ratio differs from what the architect assumed. Even then, the OS could use the more plentiful form of memory as swap-space for more in demand form (or, just have low priority memory regions used the less optimal form). This will open up a new and exciting class of resource management problems for kernel developers to eek out a few extra percentage points of performance.