You took a simple path, embedding smaller into larger. What if you need to reduce number of layers a...

thesz • yesterday at 7:44 PM • 0 replies • view on HN

You took a simple path, embedding smaller into larger. What if you need to reduce number of layers and/or width of hidden layers? How will you embed larger into smaller? As for the "addition of same layers" - would the process of "layers to add" selection be considered training?

What if you still have to obtain the best result possible for given coefficient/tokenization budget?

I think that my comment express general case, while yours provide some exceptions.

alt Hacker News