logoalt Hacker News

anonzzzieslast Wednesday at 2:01 AM5 repliesview on HN

We need custom inference chips at scale for this imho. Every computer (whatever formfactor/board) should have an inference unit on it so at least inference is efficient and fast and can be offloaded while the cpu is doing something else.


Replies

Aurornislast Wednesday at 12:36 PM

The bottleneck in common PC hardware is mostly memory bandwidth. Offloading the computation part to a different chip wouldn’t help if memory access is the bottleneck.

There have been a lot of boards and chips for years with dedicated compute hardware, but they’re only so useful for these LLM models that require huge memory bandwidth.

show 1 reply
chvidlast Wednesday at 5:16 AM

Look at the specs of this Orange Pi 6+ board - dedicated 30 TPU NPU.

https://boilingsteam.com/orange-pi-6-plus-review/

sofixalast Wednesday at 1:16 PM

Almost all of them have it already. Microsoft's "Copilot+" branding includes a prerequisite for an NPU with a minimal amount of TOPS.

It's just that practically nothing uses those NPUs.

baqlast Wednesday at 8:36 AM

At this point of the timeline compute is cheap, it’s RAM which is basically unavailable.

fouclast Wednesday at 3:38 AM

I can't believe this was downvoted. It makes a lot of sense that it would be highly useful to have mass custom inference chips.

show 1 reply