I think it depends on the use case, slow isn’t that bad if you’re asking questions infrequently. I downloaded a model a few weeks ago that was roughly 80GBs in size and ran it on my 3090 just to see how it was… and it was okay. Fast? Nope. But it did it. If the answers were materially better I’d be happy to wait a minute for the output, but they weren’t. I’d like to try one of the really large ones, just to see how slow it is, but need to clear some space to even download it.