actually, here's a broader thought. since this approach only works for classification, why not make that the whole story and spin it as a positive? Call your approach a "classification foundation model" (for example) and say it's a special-purpose model distilled from a larger world model. Abstract's gestalt could read like "If you don't need to be generative, then you can compress the representation way down" or "discriminative understanding takes far fewer parameters than language production" This would then set the stage for the reader to understand the limitations and why the benchmarks are set up the way they are.
then the kitschy paper titles could follow from that, e.g. "extreme llama compression: when classification is all you need", or "Encoder-only models: a lightweight alternative to decoder-only GPT world models" or etc.
just spitballing