Transformers will ace your test set, then faceplant the second they meet reality. I've also done the "wow, 92% accuracy!" dance only to realize later I just built a very confident pattern-matcher for my dataset quirks.
Honestly, if your accuracy/performance metrics are too good, that's almost a sure sign that something has gone wrong.
Source: bitter, bitter experience. I once predicted the placebo effect perfectly using a random forest (just got lucky with the train/test split). Although I'd left academia at that point, I often wonder if I'd have dug in deeper if I'd needed a high impact paper to keep my job.
Honestly, if your accuracy/performance metrics are too good, that's almost a sure sign that something has gone wrong.
Source: bitter, bitter experience. I once predicted the placebo effect perfectly using a random forest (just got lucky with the train/test split). Although I'd left academia at that point, I often wonder if I'd have dug in deeper if I'd needed a high impact paper to keep my job.