Man, I’ve been there. Tried throwing BERT at enzyme data once—looked fine in eval, totally flopped in the wild. Classic overfit-on-vibes scenario.
Honestly, for straight-up classification? I’d pick SVM or logistic any day. Transformers are cool, but unless your data’s super clean, they just hallucinate confidently. Like giving GPT a multiple-choice test on gibberish—it will pick something, and say it with its chest.
Lately, I just steal embeddings from big models and slap a dumb classifier on top. Works better, runs faster, less drama.
Appreciate this post. Needed that reality check before I fine-tune something stupid again.
> Lately, I just steal embeddings from big models and slap a dumb classifier on top. Works better, runs faster, less drama.
You may know this but many don't -- this is broadly known as "transfer learning".
Ironically, this comment reads like it was generated from a Transformer (ChatGPT to be specific)
>Lately, I just steal embeddings from big models and slap a dumb classifier on top. Works better, runs faster, less drama.
Sure but this is still indirectly using transformers.
I’m not sure anyone I know could make an em dash with their keyboard off the top of their head.
[meta] Here’s where I wish I could personally flag HN accounts.
What kind of data did you run this on?
> Like giving GPT a multiple-choice test on gibberish—it will pick something, and say it with its chest.
If I gave a classroom of under grad students a multiple choice test where no answers were correct, I can almost guarantee almost all the tests would be filled out.
Should GPT and other LLMs refuse to take a test?
In my experience it will answer with the closest answer, even if none of the options are even remotely correct.
Transformers will ace your test set, then faceplant the second they meet reality. I've also done the "wow, 92% accuracy!" dance only to realize later I just built a very confident pattern-matcher for my dataset quirks.