I have read in the past that ASICs for LLMs are not as simple a solution compared to cryptocurrency. In order to design and build the ASIC you need to commit to a specific architecture: a hashing algorithm for a cryptocurrency is fixed but the LLMs are always changing.
Am I misunderstanding "TPU" in the context of the article?
LLMs require memory and interconnect bandwidth so needs a whole package that is capable of feeding data to the compute. Crypto is 100% compute bound. Crypto is a trivially parallelized application that runs the same calculation over N inputs.
"Application-specific" doesn't necessarily mean unprogrammable. Bitcoin miners aren't programmable because they don't need to be. TPUs are ASICs for ML and need to be programmable so they can run different models. In theory, you could make an ASIC hardcoded for a specific model, but given how fast models evolve, it probably wouldn't make much economic sense.
Cryptocurrency architectures also change - Bitcoin is just about the lone holdout that never evolves. The hashing algorithm for Monero is designed so that a Monero hashing ASIC is literally just a CPU, and it doesn't even matter what the instruction set is.
It’s true that architectures change, but they are built from common components. The most important of those is matrix multiplication, using a relatively small set of floating point data types. A device that accelerates those operations is, effectively, an ASIC for LLMs.
Regardless of architecture (which is anyways basically the same for all LLMs), the computational needs of modern neural networks are pretty generic, centered around things like matrix multiply, which is what the TPU provides. There is even TPU support for some operations built into PyTorch - it is not just a proprietary interface that Google use themselves.