It's cool because the registers are all in RAM, with a "workspace pointer" on the CPU pointing at where they are. This is slow, but a context switch is just changing that pointer.
It's not all that slow as a concept at that time when RAM speeds were as fast as CPU speeds. I think it's just that TI's implementation of the concept in that particular cost-optimised home computer was pretty bad -- the actual registers were in 256 bytes of fast static RAM, but the rest of the system memory (both ROM and RAM) was accessed very inefficiently, not only 1 bytes at a time on a 16 bit machine, but also with something like 4 wait states for every byte.
The 6502 is not very different with a very small number of registers and Zero Page being used for most of what a modern machine would use registers for. For example (unlike the Z80) there is no register-to-register add or subtract or compare -- you can only add/sub/cmp/and/or/xor a memory location to the accumulator. Also, pointers can only be done using a pair of adjacent Zero Page locations.
As long as you were using data in those in-RAM registers the TI-99/4 was around four times faster than a 1 MHz 6502 for 16 bit arithmetic -- and with a single 2-byte instruction doing what needed 7 instructions and 13 bytes of code on 6502 -- and it was also twice as fast on 8 bit arithmetic.
It was just the cheap-ass main memory (and I/O) implementation that crippled it.
Yep, but it lacks a MMU so memory protection and paging are going to require a lot of work. I think the only reason this is feasible at all is they're running the OS out of a ROM cartridge.
Well, it has 256 bytes of RAM which is basically a really big register file, and everything else goes in the 16kb of "video RAM" which you can read and write by poking at I/O registers. So it is not easy to program.
It's arguably the only 8-bit computer which has a really different architecture from the others. You could otherwise imagine pulling the SID chip off a C-64 and putting it on a TRS-80 Color Computer etc.
Sharing the main RAM with video was a weak point in computers of that time period because the video system stole many of the memory access cycles. Some recent retrocomputers that revisit that period like
https://www.c64-wiki.com/wiki/Commander_X16
have a full-size memory bank and a video RAM memory bank which is accessed through a port which can be pretty efficient because you can auto-incremement the address register and just write 1 byte to the port to write 1 byte to video RAM and repeat.