Interesting, my assumption used to be that models over-edit when they're run with optimizations...

kgeist • yesterday at 9:19 PM • 0 replies • view on HN

Interesting, my assumption used to be that models over-edit when they're run with optimizations in attention blocks (quantization, Gated DeltaNet, sliding window etc.). I.e. they can't always reconstruct the original code precisely and may end up re-inventing some bits. Can't it be one of the reasons too?

alt Hacker News