Yeah, burying this on page 8 is a bit suspect imo (the eval datasets are listed on page 3, so if you were familiar with them you would have a hint then).
The distillation of a student that predicts "anchor layers" and then acts as a backbone for classification is perfectly cool on its own; no need to stretch the title/abstract so much.
Yeah, burying this on page 8 is a bit suspect imo (the eval datasets are listed on page 3, so if you were familiar with them you would have a hint then).
The distillation of a student that predicts "anchor layers" and then acts as a backbone for classification is perfectly cool on its own; no need to stretch the title/abstract so much.