logoalt Hacker News

Scene_Cast2yesterday at 6:47 PM0 repliesview on HN

My baseline was non-HC "vanilla" residuals; I didn't do a meaningful HC run to compare.

My application has some particularities (important and easy to identify per-token signals) that result in values growing (about 3x to 10x) through layers even in the baseline.