That's exactly what Nvidia is doing with tensor cores.
Except the native width of Tensor Cores are about 8-32 (depending on scalar type), whereas the width of TPUs is up to 256. The difference in scale is massive.
Except the native width of Tensor Cores are about 8-32 (depending on scalar type), whereas the width of TPUs is up to 256. The difference in scale is massive.