logoalt Hacker News

hackpertlast Thursday at 12:30 PM0 repliesview on HN

These metaphorical database analogies bug me, and from what it seems like, a lot of other people in comments! So far some of the most reasonable explanations I have found that take training dynamics into account are from Lenka Zdeborova's lab (albeit in toy, linear attention settings but it's easy to see why they generalize to practical ones). For instance, this is a lovely paper: https://arxiv.org/abs/2509.24914