logoalt Hacker News

1-Bit AI Infrastructure

146 pointsby galeoslast Friday at 2:28 PM29 commentsview on HN

Comments

dailykoderyesterday at 8:12 AM

I have read about it quite a few weeks ago the first time and I found it very interesting.

Now that I have done more than enough CPU design inside FPGAs, I wanted to try something new, some computation heavy things that could benefit from an FPGA. Does anyone here know how feasable it'd be to implement something like that on an FPGA? I only have rather small chips (artix-7 35T and polarfire SoC with 95k logic slices). So I know I won't be able to press a full LLM into that, but something should be possible.

Maybe I should refresh the fundamentals though and start with MNIST. But the question is rather: What is a realistic goal that I could possibly reach with these small FPGAs? Performance might be secondary, I am rather interested in what's possible regarding complexity/features on a small device.

Also has anyone here compiled openCL (or GL?) kernels for FPGAs and can give me a starting point? I was wondering if it's possible to have a working backend for something like tinygrad[1]. I think this would be a good way to learn all the different layers on how such frameworks actually work

- [1] https://github.com/tinygrad/tinygrad

show 4 replies
sva_yesterday at 11:02 AM

It seems like arxiv replaced 'bitnet.cpp' with a link 'this http url', even though '.cpp' is clearly not a tld. Poor regex?

show 2 replies
ttyprintklast Friday at 2:32 PM

Later a4.8 quantization by some of the same team:

https://news.ycombinator.com/item?id=42092724

https://arxiv.org/abs/2411.04965

show 1 reply
js8yesterday at 12:19 PM

It's technically not 1-bit, but 2-bit.

Anyway, I wonder if there is some HW support in modern CPUs/GPUs for linear algebra (like matrix multiplication) over Z_2^n ? I think it would be useful for SAT solving.

show 4 replies
WiSaGaNyesterday at 8:44 AM

I would expect research along this way to pick up quite a bit if we confirm the pretrain stage is not scaling as previous expected, thus the scale and architecture would be more stable in the near future, especially if the focus shifts to inference time scaling.

yalokyesterday at 5:54 PM

So basically the idea is to pack 3 ternary weights (-1,0,1) into 5 bits instead of 6, but they compare the results with fp16 model which would use 48 bits for those 3 weights…

And speed up comes from the memory io, compensated a bit by the need to unpack these weights before using them…

Did I get this right?

show 1 reply
hidelooktropicyesterday at 5:59 PM

Does anyone have the actual "this http url"?

show 1 reply