Right. But ... this would limit you to either extremely small models or extremely large FPGA's, yes? If there's a simple machine learning task that requires a sub microsecond latency I can see the point but otherwise??
Happy to hear that KANs continue to find solid footing.
This guy will be hired by a high-frequency trading firm, and the next time we hear about him, he will have a net worth in 9 figures.
Archive link, as it looks like the original post was taken down: https://web.archive.org/web/20260609200156/https://aarushgup...
[dead]
[dead]
So for people wondering if it can be used to accelerate LLM inference, sadly not.
I've been trying to hit 100,000tokens/s with a 3.28m dumb model, and even this is an order of magnitude too large to benefit.
It appears to be focussed more on latency, than throughput. Happy to be corrected?