What do you need Matrix Cores for when you already have a NPU which can access the same memory, and even seems to include more flexible FPGA fabric? It's six of one, half a dozen of another.
The NPU is generally pretty weak and not pipelined into the GPU's logic (which is already quite large on-die). It feels like the past 10 years have taught us that if you're going to create tensor-specific hardware then it makes the most sense to put it in your GPU and not a dark-silicon coprocessor.
Can you do GPU -> NPU -> GPU for streaming workloads? The GPU can be more flexible than Tensor HW for preprocessing, light branching, etc.
Also, Strix Halo NPU is 50 TOPS. The desktop RDNA 4 chips are into the 100s.
As for consumer uses, I mentioned it's an open question. Blender? FFmpeg? Database queries? Audio?
I have the HP Zbook G1a running the same CPU and RAM under HP Ubuntu. I have not seen any OOTB way to use the TPU. I can get ROCm software to run but it does not use it. No system tools show its activity that I can see. It seems to be a marketing gimmick. Shame.