The gap between how this is described in the paper vs the blog post is pretty wide. Would be nice to see more accessible writing from research teams — not everyone reading is a ML engineer
Agreed. The practical implications are often more interesting than the math anyway — smaller models running locally means you can afford to run multiple models in parallel for cross-validation, which changes how you approach tasks like code analysis or bug detection.
These are very different media types with very different goals.