This is already done as much as possible by reordering and merging operations but transposition (explicit or implicit) is unavoidable for some operations.
A good example for this is A + A^T; you can fuse the two operations but you cannot get around the access pattern of matrix transposition.
A good example for this is A + A^T; you can fuse the two operations but you cannot get around the access pattern of matrix transposition.