Very cool, but can I suggest the `add` CPU instruction instead? Supports 64-bit numbers, and it'...

delta_p_delta_x • today at 3:10 AM • 3 replies • view on HN

Very cool, but can I suggest the `add` CPU instruction instead? Supports 64-bit numbers, and it's encoded in hardware, and no need to cross a PCIe interface into a beefy, power-hungry GPU and back again. And chances are it's cross-platform, because basically every ISA since the very first has had `add`.

Replies

ACCount37 • today at 6:09 AM

No. You cannot. It's the wrong tool for the problem.

That little "add" of yours has the overhead of: having an LLM emit it as a tool call, having to pause the LLM inference while waiting for it to resolve, then having to encode the result as a token to feed it back.

At the same time, a "transformer-native" addition circuit? Can be executed within a single forward pass at a trivial cost, generate transformer-native representations, operate both in prefill and in autoregressive generation, and more. It's cheaper.

nurettin • today at 3:43 AM

I mean, yeah, no need to put a bunch of high powered cars in a circular track to watch them race really close to each other at incredible speeds, causing various hazards, either. Especially since city buses have been around for ages.

➕ show 1 reply

mcdeltat • today at 3:38 AM

"smallest supercomputing cluster that can add two numbers"

alt Hacker News

Replies