+1 I’ve always had the feeling that training from randomly initialized weights without seeding some ...

rao-v • today at 8:36 AM • 1 reply • view on HN

+1 I’ve always had the feeling that training from randomly initialized weights without seeding some substructure is unnecessarily slowing LLM training.

Similarly I’m always surprised that we don’t start by training a small set of layers, stack them and then continue.

Replies

ACCount37 • today at 8:52 AM

Better-than-random initialization is underexplored, but there are some works in that direction.

One of the main issues is: we don't know how to generate useful computational structure for LLMs - or how to transfer existing structure neatly across architectural variations.

What you describe sounds more like a "progressive growing" approach, which isn't the same, but draws from some similar ideas.

alt Hacker News

Replies