1-Bit AI Infrastructure

157 points • by galeos • 11/15/2024 • 30 comments • view on HN

Comments

dailykoder • 11/20/2024

I have read about it quite a few weeks ago the first time and I found it very interesting.

Now that I have done more than enough CPU design inside FPGAs, I wanted to try something new, some computation heavy things that could benefit from an FPGA. Does anyone here know how feasable it'd be to implement something like that on an FPGA? I only have rather small chips (artix-7 35T and polarfire SoC with 95k logic slices). So I know I won't be able to press a full LLM into that, but something should be possible.

Maybe I should refresh the fundamentals though and start with MNIST. But the question is rather: What is a realistic goal that I could possibly reach with these small FPGAs? Performance might be secondary, I am rather interested in what's possible regarding complexity/features on a small device.

Also has anyone here compiled openCL (or GL?) kernels for FPGAs and can give me a starting point? I was wondering if it's possible to have a working backend for something like tinygrad[1]. I think this would be a good way to learn all the different layers on how such frameworks actually work

- [1] https://github.com/tinygrad/tinygrad

➕ show 4 replies

sva_ • 11/20/2024

It seems like arxiv replaced 'bitnet.cpp' with a link 'this http url', even though '.cpp' is clearly not a tld. Poor regex?

➕ show 2 replies

ttyprintk • 11/15/2024

Later a4.8 quantization by some of the same team:

https://news.ycombinator.com/item?id=42092724

https://arxiv.org/abs/2411.04965

➕ show 1 reply

js8 • 11/20/2024

It's technically not 1-bit, but 2-bit.

Anyway, I wonder if there is some HW support in modern CPUs/GPUs for linear algebra (like matrix multiplication) over Z_2^n ? I think it would be useful for SAT solving.

➕ show 4 replies

WiSaGaN • 11/20/2024

I would expect research along this way to pick up quite a bit if we confirm the pretrain stage is not scaling as previous expected, thus the scale and architecture would be more stable in the near future, especially if the focus shifts to inference time scaling.

yalok • 11/20/2024

So basically the idea is to pack 3 ternary weights (-1,0,1) into 5 bits instead of 6, but they compare the results with fp16 model which would use 48 bits for those 3 weights…

And speed up comes from the memory io, compensated a bit by the need to unpack these weights before using them…

alt Hacker News

1-Bit AI Infrastructure

Comments