logoalt Hacker News

haellsightoday at 8:10 AM1 replyview on HN

Fyi, I believe `--flash-attn on` doesn't do anything, you should instead use `--flash-attn 1`. I'm getting ~150t/s on a RTX 3080 10GB as well with f16 cache type.


Replies

freakynittoday at 9:33 AM

Thanks.. updated my local docs :)