In a chatbot, 17k tok/s is a neat but nearly useless showcase. In a coding agent it is a meaningful improvement. In robotics, it could be an absolute revolution.
8B models aren't useful in general, but for specific use cases they can provide an enourmous amount of intelligence - nVidia's Tesla/Waymo competitor is a 7B LLM with a 2B diffusion model, and running that at those speeds could be an order of magnitude cheaper than existing solutions.
Bumping the speed of these things would be more than meaningful. It would be a massive game changer.
I assert like 80% of this “multi agent parallel workflow” business is simply a workaround to models being soooooo slow. Like as the dude driving these things… you kick it off and twiddle your thumbs waiting minutes to hours sometimes for all the inference and token generator to finish. So you dispatch multiple workstreams in parallel to be more efficient.
I assert that if the model was even 10x faster we’d be using these things radically different. You’d be doing things that are currently time prohibitive. At 100x, holy shit will software dev get crazy. You’d be kicking off hundreds of parallel workers attacking a problem from every angle and stuff. Who even knows!!!
And the thing is, 10x will absolutely come and probably even 100x. And it will be sold like a video game cartridge or something depending on how the actual model gets “baked” into the hardware. No remote inference at all.
Could you give me some example how in robotics it can be an absolute revolution?
My understanding is that robotics doesn't really rely much on LLM's in the first place but rather other things.
Is the thing that you are suggesting that it would ingest all real time data and then reason through it at an incredibly fast speed and then act on it and re-iterate? I might imagine some problems with this though I am not a robotics engineer and perhaps someone who deeply understands this topic can give more information.
17K tok/s is approaching realtime motor cortex needs for a robot with ~12 actuators (bipedal humanoid) and an IMU. I don't know how many parameters a motor cortex would need but 8B feels like it is within 2 orders of magnitude.