logoalt Hacker News

antirezyesterday at 6:14 PM2 repliesview on HN

May I ask you what did you used for the DS4F inference? It is a model with very low hallucination rate in my tests.


Replies

antirezyesterday at 7:12 PM

Btw, a few data points:

1. DS4F can run on a 128GB MacBook. M2.7 is larger (8 bit weights of routed experts). There is to see how it holds at 4 bits. At 2 bits it may not work well at all.

2. Just the KV cache of M2.7 would take ~50GB for 200k tokens AFAIK. It does not have the compressed KV cache that DS4F features.

3. The models are very similar in performances, despite all that. And DS4F is likely getting an update soon.

So it is basically a quasi-frontier model that can run on a 96/128GB MacBook at large context windows. That's non trivial. Likely a coding version could be released in the future.

show 1 reply
anonym29yesterday at 6:17 PM

Per AA's Omniscience Index benchmark, the "non-hallucination rate" subcomponent (1 - hallucination rate) of 4% for DS4F vs 66% for M2.7.

https://artificialanalysis.ai/leaderboards/models?weights=op...

show 1 reply