Great idea! We haven’t tried it but def interested to see if that works as well.
When we started down this path, T5 was the standard (back in 2024).
Likely won’t be the text encoder for subsequent models, given its size (per your point) and age