> The big question for local LLMs is whether there is a 100 tok/s model which requires less than 16 GB of memory and is competitive on most tasks with the cloud models.
Benchmarks maybe? Real world, no.
You just need the context otherwise. There's no way around it.
Context is more available locally. You can have the LLM operate for arbitrarily long periods, use your credentials to access services (if desired), store memory locally etc.
Whether such a model exists or not is a different question.