I think any kind of innovation here will have to take advantage of some structure inherent to the problem, like eliminating attention in favour of geometric structures like Grassman flows [1].
[1] Attention Is Not What You Need, https://arxiv.org/abs/2512.19428
Right - e.g., if you're modeling a physical system it makes sense to bake in some physics - like symmetry.