It's the MMU width, not the ALU width, that matters.
Lots of machines are capable of running with 32-bit pointers and 64-bit integers ("Knuth mode" aka "ILP32"). You get a huge improvement in memory density as long as no single process needs more than 4GB of core.
I assume you mean "pointer width" - ala the x32 ABI and similar, and more about cache use than "Switching Transistor Count".
But really that's a software/OS level thing, and though the benefits have definitely been shown, the seem small enough to not be worth the upheaval.
Though possibly related, larger pages have been shown to have significant speedups without changing the ABI (as much, at least mmap() and similar will have slightly different specifics). IMHO the only possible "benefit" to 4kb page sizes is to (ab)use it to trap things like array overruns - though using that is a poor substitute for /real/ bounds checking - a lot can go wrong within 4kb, after all.