Reminds that time when cheap Celeron with small cache was beating expensive Pentium with large cache (if i remember correctly that Celeron's cache was running at the core frequency while Pentium's was a separate die on half-frequency, and Celeron was very overclockable)
Pentium 4 the first release. AMD had the same gimmick with their Phenoms.
Lower cache per core is actually a pretty natural outcome with the latest device fabrication nodes shrinking logic while leaving the size of SRAM largely unchanged. We may perhaps also see eDRAM (a lot denser than SRAM) for last-level caches.