logoalt Hacker News

ajaksalad10/11/20240 repliesview on HN

> I was a bit surprised Meta didn't publish an example way to simply invoke one of these LLM's with only torch (or some minimal set of dependencies)

Seems like torchchat is exactly what the author was looking for.

> And the 8B model typically gets killed by the OS for using too much memory.

Torchchat also provides some quantization options so you can reduce the model size to fit into memory.