Poverty of imagination here, plenty uses of this and its a prototype at this stage.

small_model • yesterday at 12:07 PM • 1 reply • view on HN

Replies

What uses, exactly?

The prototype is: silicon with a Llama 3.1 8B etched into it. Today's 4B models already outperform it.

Token rate in five digits is a major technical flex, but, does anyone really need to run a very dumb model at this speed?

The only things that come to mind that could reap a benefit are: asymmetric exotics like VLA action policies and voice stages for V2V models. Both of which are "small fast low latency model backed by a large smart model", and both depend on model to model comms, which this doesn't demonstrate.

In a way, it's an I/O accelerator rather than an inference engine. At best.

➕ show 2 replies

alt Hacker News

Replies