I’m sure there are a plethora of technical reasons it’s impractical - but my dream is a big, unified L3 cache across their CCD chiplets. Maybe 256mb in size for the x950 x3d chips.
They could bond multiple CCDs on top of a single large unified L3 die (similar to MI300C) if they wanted to. I've seen no rumors about that though.
I'm currently cache limited by my work and I share your dream
There are challenges with really big monolithic caches. IBM does something sort of like your idea in their Power and Telum chips, with different approaches. Power has a non-uniform cache within each die, Telum has a way to stitch together cache even across sockets (!).
https://chipsandcheese.com/p/telum-ii-at-hot-chips-2024-main...
https://www.eecg.utoronto.ca/~moshovos/ACA07/projectsuggesti...
(if you do ML things you might recognize Doug Burger's name on the authors line of the second one)