logoalt Hacker News

pcwelderyesterday at 6:26 AM0 repliesview on HN

You've essentially just trained your own LM instead of using a pretrained large LM.

Speaking generically -- any place in your workflow you feel the task is not hard, you can use smaller and cheaper LM.

Smaller LMs come with accuracy reduction, particularly in tail cases. So in the real world this doesn't work out.

Also is gumble softmax usage intentional? It looks like a straightforward classifier that just needs regular softmax.