logoalt Hacker News

lelandbateyyesterday at 7:50 PM0 repliesview on HN

https://chatjimmy.ai being a demo of the "burn the model to an ASIC" approach being sold by Taalas[0], an approach which they use to run Llama 3.1 8B at ~17000 tokens per second.

[0] - https://taalas.com/products/