I would love to be able to fit small matrices (4x4 or 16x16 depending on precision) in SIMD registers together with intrinsics for matrix arithmetic.