logoalt Hacker News

utopcelltoday at 3:29 AM1 replyview on HN

Very strong statement on the title, given the following limitation:

> Generation tasks. Method applies to classification only. Preliminary decoder experiments show perplexity increases.


Replies

daemonologisttoday at 3:34 AM

Yeah, burying this on page 8 is a bit suspect imo (the eval datasets are listed on page 3, so if you were familiar with them you would have a hint then).

The distillation of a student that predicts "anchor layers" and then acts as a backbone for classification is perfectly cool on its own; no need to stretch the title/abstract so much.

show 1 reply