logoalt Hacker News

cyanydeeztoday at 7:01 PM1 replyview on HN

not at the vram sizes that control how much context to load; also, GPUs arn't as effiecient as direct inference.


Replies

wmftoday at 9:15 PM

OK, B70.