Isn't this simply context pruning/optimization?
From the abstract, it looks like it's actually doing something deeper, updating weights in part of the model?
No, they're actually training weights based on context before compaction. Context is context, this is splitting the model into persistent weights and malleable ones which are periodically updated.
From the abstract, it looks like it's actually doing something deeper, updating weights in part of the model?