Who's going to pay for custom chips when they shit out new models every two weeks and their deluded CEOs keep promising AGI in two release cycles?
New GPUs come out all the time. New phones come out (if you count all the manufacturers) all the time. We do not need to always buy the new one.
Current open weight models < 20B are already capable of being useful. With even 1K tokens/second, they would change what it means to interact with them or for models to interact with the computer.
To run Llama 3.1 8B locally, you would need a GPU with a minimum of 16 GB of VRAM, such as an NVIDIA RTX 3090.
Talas promises a 10x higher throughtput, being 10x cheaper and using 10x less electricity.
Looks like a good value proposition.
You obviously don't believe that AGI is coming in two release cycles, and you also don't seem to have much faith in the new models containing massive improvements over the last ones. So the answer to who is going to pay for these custom chips seems to be you.