> The big question for local LLMs is whether there is a 100 tok/s model which requires less ...

re-thc • today at 7:57 AM • 1 reply • view on HN

> The big question for local LLMs is whether there is a 100 tok/s model which requires less than 16 GB of memory and is competitive on most tasks with the cloud models.

Benchmarks maybe? Real world, no.

You just need the context otherwise. There's no way around it.

Replies

lumost • today at 11:34 AM

Context is more available locally. You can have the LLM operate for arbitrarily long periods, use your credentials to access services (if desired), store memory locally etc.

Whether such a model exists or not is a different question.

alt Hacker News

Replies