There was a startup posted here which built custom hardware that let the AI respond instantly. Thousands of tokens per second.
cerebras
They built an entire wafer ASIC. The entire thing is one huge active ASIC. it takes a lot of cool engineering and cooling to make it work, and is very cool.
Taalas. A sibling comment of yours posted the chat demo URL -
https://chatjimmy.ai/