There are so many use cases for small and super fast models that are already in size capacity -
* Many top quality tts and stt models
* Image recognition, object tracking
* speculative decoding, attached to a much bigger model (big/small architecture?)
* agentic loop trying 20 different approaches / algorithms, and then picking the best one
* edited to add! Put 50 such small models to create a SOTA super fast model