logoalt Hacker News

alfonsodevyesterday at 3:17 PM1 replyview on HN

good to hear! Do you mind sharing your setup and tokens / seconds performance ?


Replies

lreevesyesterday at 4:17 PM

I'm running the unquantized base model on 2xA6000s (Ampere gen, 48GB each). Runs at about 25 tokens/second.

show 1 reply