We need custom inference chips at scale for this imho. Every computer (whatever formfactor/boar...

anonzzzies • last Wednesday at 2:01 AM • 5 replies • view on HN

We need custom inference chips at scale for this imho. Every computer (whatever formfactor/board) should have an inference unit on it so at least inference is efficient and fast and can be offloaded while the cpu is doing something else.

Replies

Aurornis • last Wednesday at 12:36 PM

The bottleneck in common PC hardware is mostly memory bandwidth. Offloading the computation part to a different chip wouldn’t help if memory access is the bottleneck.

There have been a lot of boards and chips for years with dedicated compute hardware, but they’re only so useful for these LLM models that require huge memory bandwidth.

➕ show 1 reply

chvid • last Wednesday at 5:16 AM

Look at the specs of this Orange Pi 6+ board - dedicated 30 TPU NPU.

https://boilingsteam.com/orange-pi-6-plus-review/

sofixa • last Wednesday at 1:16 PM

Almost all of them have it already. Microsoft's "Copilot+" branding includes a prerequisite for an NPU with a minimal amount of TOPS.

It's just that practically nothing uses those NPUs.

baq • last Wednesday at 8:36 AM

At this point of the timeline compute is cheap, it’s RAM which is basically unavailable.

fouc • last Wednesday at 3:38 AM

I can't believe this was downvoted. It makes a lot of sense that it would be highly useful to have mass custom inference chips.

➕ show 1 reply

alt Hacker News

Replies