owning GGUF conversion step is good in sone circumstances, but running in fp16 is below optimal for ...

everlier • today at 2:44 AM • 0 replies • view on HN

owning GGUF conversion step is good in sone circumstances, but running in fp16 is below optimal for this hardware due to low-ish bandwidth.

It looks like context is set to 32k which is the bare minimum needed for OpenCode with its ~10k initial system prompt. So overall, something like Unsloth's UD q8 XL or q6 XL quants free up a lot of memory and bandwidth moving into the next tier of usefulness.

alt Hacker News