You've essentially just trained your own LM instead of using a pretrained large LM. Speaking ...

pcwelder • yesterday at 6:26 AM • 0 replies • view on HN

You've essentially just trained your own LM instead of using a pretrained large LM.

Speaking generically -- any place in your workflow you feel the task is not hard, you can use smaller and cheaper LM.

Smaller LMs come with accuracy reduction, particularly in tail cases. So in the real world this doesn't work out.

Also is gumble softmax usage intentional? It looks like a straightforward classifier that just needs regular softmax.

alt Hacker News