There are challenges with really big monolithic caches. IBM does something sort of like your idea in their Power and Telum chips, with different approaches. Power has a non-uniform cache within each die, Telum has a way to stitch together cache even across sockets (!).
https://chipsandcheese.com/p/telum-ii-at-hot-chips-2024-main...
https://www.eecg.utoronto.ca/~moshovos/ACA07/projectsuggesti...
(if you do ML things you might recognize Doug Burger's name on the authors line of the second one)