Yea, in many usecases the tooling space is increasingly sophisticated context management such as fine tuning domain specific mappings into the model so that it is able to work directly with a compressed form of some data without needing to decompress into the context.
In larger models, these fine tuning techniques work more reliably/robustly. Because of this many usecases tend to prefer larger models. It is possible to work the same behaviour into the smaller model, but it requires more effort. But it's one-time. And smaller models are usually much cheaper. People make a tradeoff along this curve.
This is observed at few-B scale upto hundred-B scale. No way for us non-anthropic/openai to fine tune beyond that of course.