> That is, if the batch signal on a parameter exceeds its leave-one-out noise, update it; if not,...

minimaltom • yesterday at 10:28 PM • 0 replies • view on HN

> That is, if the batch signal on a parameter exceeds its leave-one-out noise, update it; if not, skip it. This is a one-line change to Adam that accelerates grokking by 5x, suppresses memorization in PINNs, and improves DPO fine-tuning, eliminating the need for validation sets entirely.

Does anyone understand the formula they expressed above this sentence? is this just the classic "skip updating parameters with high gradient/loss variance in multiple batches/samples" ?

alt Hacker News