> meager hardware Qwen was made on a cluster about that size. And this is before anybody ever...

otabdeveloper4 • yesterday at 8:50 PM • 0 replies • view on HN

> meager hardware

Qwen was made on a cluster about that size.

And this is before anybody ever thought about optimizing the training process. (Currently it's just pytorch analyst-as-coder slop, with extremely overprovisioned quantizations, etc.)

alt Hacker News