> It is also important to note that, until recently, the GenAI industry’s focus has largely been on training workloads. In training workloads, CUDA is very important, but when it comes to inference, even reasoning inference, CUDA is not that important, so the chances of expanding the TPU footprint in inference are much higher than those in training (although TPUs do really well in training as well – Gemini 3 the prime example).
Does anyone have a sense of why CUDA is more important for training than inference?
A question I don't see addressed in all these articles: what prevents Nvidia from doing the same thing and iterating on their more general-purpose GPU towards a more focused TPU-like chip as well, if that turns out to be what the market really wants.
5 days ago: https://news.ycombinator.com/item?id=45926371
Sparse models have same quality of results but have less coefficients to process, in case described in the link above sixteen (16) times as less.
This means that these models need 8 times less data to store, can be 16 and more times faster and use 16+ times less energy.
TPUs are not all that good in the case of sparse matrices. They can be used to train dense versions, but inference efficiency with sparse matrices may be not all that great.
I have read in the past that ASICs for LLMs are not as simple a solution compared to cryptocurrency. In order to design and build the ASIC you need to commit to a specific architecture: a hashing algorithm for a cryptocurrency is fixed but the LLMs are always changing.
Am I misunderstanding "TPU" in the context of the article?
This feels a lot like the RISC/CISC debate. More academic than it seems. Nvidia is designing their GPUs primarily to do exactly the same tasks TPUs are doing right now. Even within Google it's probably hard to tell whether or not it matters on a 5-year timeframe. It certainly gives Google an edge on some things, but in the fullness of time "GPUs" like the H100 are primarily used for running tensor models and they're going to have hardware that is ruthlessly optimized for that purpose.
And outside of Google this is a very academic debate. Any efficiency gains over GPUs will primarily turn into profit for Google rather than benefit for me as a developer or user of AI systems. Since Google doesn't sell TPUs, they are extremely well-positioned to ensure no one else can profit from any advantages created by TPUs.
This is highly relevant:
"Meta in talks to spend billions on Google's chips, The Information reports"
https://www.reuters.com/business/meta-talks-spend-billions-g...
With its AI offerings, can Google suck the oxygen out of AWS? AWS grew big because of compute. The AI spend will be far larger than compute. Can Google launch AI/Cloud offerings with free compute bundled? Use our AI, and we'll throw in compute for free.
It's a cool subject and article and things I only have a general understanding of (considering the place of posting).
What I'm sure about is having a programming unit more purposed to a task is more optimal than a general programming unit designed to accommodate all programming tasks.
More and more of the economics of programming boils down to energy usage and invariably towards physical rules, the efficiency of the process has the benefit of less energy consumed.
As a Layman is makes general sense. Maybe a future where productivity is based closer on energy efficiency rather than monetary gain pushes the economy in better directions.
Cryptocurrency and LLMs seem like they'll play out that story over the next 10 years.
Given the importance of scale for this particular product, any company placing itself on "just" one layer of the whole story is at a heavy disadvantage, I guess. I'd rather have a winning google than openai or meta anyway.
How much of current GPU and TPU design is based around attn's bandwith hungry design? The article makes it seem like TPUs aren't very flexible so big model architecture changes, like new architectures that don't use attn, may lead to useless chips. That being said, I think it is great that we have some major competing architectures out there. GPUs, TPUs and UMA CPUs are all attacking the ecosystem in different ways which is what we need right now. Diversity in all things is always the right answer.
> The GPUs were designed for graphics [...] However, because they are designed to handle everything from video game textures to scientific simulations, they carry “architectural baggage.” [...] A TPU, on the other hand, strips away all that baggage. It has no hardware for rasterization or texture mapping.
With simulations becoming key to training models doesn't this seem like a huge problem for Google?
At this stage, it is somewhat clear that it doesn't really matter who's ahead in the race, cause everyone else is super close behind...
You can't really buy a TPU, you have to buy the entire data center that includes the TPU plus the services and support. In Google Colab, I often don't prefer the TPU either because the documentation for the AI isn't made for it. While this could all change in the long term, I also don't see these changes in Google's long term strategy. There's also the problem with Google's graveyard which isn't mentioned in the long term of the original article. Combined with these factors, I'm still skeptical about Google's lead on AI.
Google has always had great tech - their problem is the product or the perseverance, conviction, and taste needed to make things people want.
All this assumes that LLMs are the sole mechanism for AI and will remain so forever: no novel architectures (neither hardware nor software), no progress in AI theory, nothing better than LLMs, simply brute force LLM computation ad infinitum.
Perhaps the assumptions are true. The mere presence of LLMs seems to have lowered the IQ of the Internet drastically, sopping up financial investors and resources that might otherwise be put to better use.
Will Google sell TPUs that can be plugged into stock hardware, or custom hardware with lots of TPUs? Our customers want all their video processing to happen on site, and don't want their video or other data to touch the cloud, so they're not happy about renting cloud TPUs or GPUs. Also it would be nice to have smart cameras with built-in TPUs.
In my 20+ years of following NVIDIA, I have learned to never bet against them long-term. I actually do not know exactly why they continually win, but they do. The main issue they have a 3-4 year gap between wanting a new design pivot and realizing it (silicon has a long "pipeline"), it can seem that they may be missing a new trend or swerve in the demands of the market, it is often simply because there is this delay.
Any chance of a bit of support for jax-metal, or incorporating apple silicon support into Jax?
That and the fact they can self-fund the whole AI venture and don't require outside investment.
This is the “Microsoft will dominate the Internet” stage.
The truth is the LLM boom has opened the first major crack in Google as the front page of the web (the biggest since Facebook), in the same way the web in the long run made Windows so irrelevant Microsoft seemingly don’t care about it at all.
[dead]
Right because people would love to get locked into another even more expensive platform.
How high are the chances that as soon as China produces their own competitive TPU/GPU, they'll invade Taiwan in order to starve the West in regards to processing power, while at the same time getting an exclusive grip on the Taiwanese Fabs?
Google's real moat isn't the TPU silicon itself—it's not about cooling, individual performance, or hyper-specialization—but rather the massive parallel scale enabled by their OCS interconnects.
To quote The Next Platform: "An Ironwood cluster linked with Google’s absolutely unique optical circuit switch interconnect can bring to bear 9,216 Ironwood TPUs with a combined 1.77 PB of HBM memory... This makes a rackscale Nvidia system based on 144 “Blackwell” GPU chiplets with an aggregate of 20.7 TB of HBM memory look like a joke."
Nvidia may have the superior architecture at the single-chip level, but for large-scale distributed training (and inference) they currently have nothing that rivals Google's optical switching scalability.