It depends on the kernel architecture. 4G/4G kernels weren't the most common thing, but also weren't exactly rare in the grand scheme of things. PowerPC macOS (and x86 in macOS before they officially released Intel based mac hardware) were 4G/4G for example. The way that works under x86 is that you just reserve a couple kernel pages mapped into both address spaces to do the page table swap on interrupts and syscalls. A little expensive, but less than you'd think, and having the kernel and user space not fight for virtual address space provided its own efficiencies to partially make up the difference. We've been moving back to that anyway with Kernel Page Table isolation for spectre mitigations.
And 3-1 wasn't really experimental. It was essentially always that way under Linux, and had been supported under Windows since the late 90s.
Yeah, "experimental" may not be the right word, but actually getting to use the 3-1 split required all of the following: at least 3GB of physical RAM (obviously), the O/S booted with /3GB flag, and the application in question linked with /LARGEADDRESSAWARE flag (and not mishandling the high bit of a pointer). Many video games towards the end of the 32-bit era were built this way tbf, though they still generally do better on 64-bit Windows/Wine anyway.