Not just more power-efficient, but also a little more memory efficient because pointers are only half as big and so don't take up as much space in the cache. Lower-bit chips are also smaller (which could translate into faster clock and/or more functional units per superscaler core and/or more cores per die).
Part of the problem with these discussion is that often when people say "64-bit" vs "32-bit" they are also considering all the new useful instructions that were added to the new instruction set generation. But a true "apples-to-apples" comparison between "32-bit" and "64-bit" should be using almost identical whose only difference is the datapath and pointer size.
I feel that the programs and games I run shouldn't really need more than 4GB memory anyway, and the occasion instance that the extra precision of 64-bit math is useful could be handled by emulating the 64-bit math with the compiler adding a couple extra 32-bit instructions.
I wish x32 had taken off; better performance and lower memory use.
Applications don't get 4GB with a 32-bit address space. The practical split between application and kernel was usually 1-3 or 2-2 with 3-1 being experimental and mooted with the switch to 64-bit. Nowadays with VRAM being almost as large as main RAM, you need the larger address space just to map a useful chunk of it in.
When you factor in memory fragmentation, you really only had a solid 0.75-1.5GB of space that could be kept continuously in use. That was starting to become a problem even when 32-bit was the only practical option. A lot of games saw a benefit to just having the larger address space, such that they ran better in 64-bit with only 4GB of RAM despite the fatter 64-bit pointers.