logoalt Hacker News

ismailmaj11/08/20241 replyview on HN

How does it compare to partially fine-tuning the model by freezing most of the network beside the last few layers?


Replies

K0balt11/09/2024

Idk but if I was guessing, I would guess that that process would be likely to create intruder dimensions in those layers… but hard to say how impactful that would be. Intuitively I would think it would tend to channel a lot of irrelevant outputs towards the semantic space of the new training data, but idk I how well that intuition would hold up to reality.