I just wanted to express gratitude to you guys, you do great work. However, it is a little annoying ...

evilduck • yesterday at 4:36 PM • 2 replies • view on HN

I just wanted to express gratitude to you guys, you do great work. However, it is a little annoying to have to redownload big models though and keeping up with the AI news and community sentiment is a full time job. I wish there was some mechanism somewhere (on your site or Huggingface or something) for displaying feedback or confidence in a model being "ready for general use" before kicking off 100+ GB model downloads.

Replies

danielhanchen • yesterday at 4:55 PM

Hey thanks - yes agreed - for now we do:

1. Split metadata into shard 0 for huge models so 10B is for chat template fixes - however sometimes fixes cause a recalculation of the imatrix, which means all quants have to be re-made

2. Add HF discussion posts on each model talking about what changed, and on our Reddit and Twitter

3. Hugging Face XET now has de-duplication downloading of shards, so generally redownloading 100GB models again should be much faster - it chunks 100GB into small chunks and hashes them, and only downloads the shards which have changed

➕ show 2 replies

CamperBob2 • yesterday at 5:42 PM

Best policy is to just wait a couple of weeks after a major model is released. It's frustrating to have to re-download tens or hundreds of GB every few days, but the quant producers have no choice but to release early and often if they want to maintain their reputation.

Ideally the labs releasing the open models would work with Unsloth and the llama.cpp maintainers in advance to work out the bugs up front. That does sometimes happen, but not always.

➕ show 1 reply

alt Hacker News

Replies