Is transposition a common enough operation that it might be better to avoid it by having versions of the operations/functions that take matrices that do the necessary transpositions implicitly?
This is already done as much as possible by reordering and merging operations but transposition (explicit or implicit) is unavoidable for some operations.
IIRC, libraries like numpy and pytorch can already do that as they store the matrices as 1D arrays with information on things like the stride length (advancing to the next row). That allows you to implement operations like transposition by editing the stride length and other parameters without manipulating the content of the matrix array.