logoalt Hacker News

AWS Trainium3 Deep Dive – A Potential Challenger Approaching

68 pointsby Symmetrylast Thursday at 7:19 PM22 commentsview on HN

Comments

klysmyesterday at 4:11 PM

This won't materialize into a legitimate threat on the NVIDIA/TPU landscape without enormous software investment. That's why NVIDIA won in the first place. This requires executives to see past the hardware and make riskier investments and we will see if this actually materializes under AWS management or not.

show 7 replies
thecopyyesterday at 5:12 PM

I have seen links to semianalysis before, i just am scared of the length of this content. Is anyone reading these start to finish? Why?

show 4 replies
artur44yesterday at 4:54 PM

The hardware story is interesting, but I’m curious how much of the real-world adoption will depend on the maturity of the compiler stack. Trainium2 already showed that good silicon isn’t enough if the software layer lags behind.

If AWS really delivers on open-sourcing more of the toolchain, that could be a much bigger signal for adoption than raw specs alone.

t1234syesterday at 9:15 PM

What does this mean for a company like Coreweave?

show 1 reply
jauntywundrkindyesterday at 4:52 PM

> they will go with three different scale-up switch solutions over the lifecycle of Trainium3, starting with a 160 lane, 20 port PCIe switch for fast time to market due to the limited availability today of high lane & port count PCIe switches, later switching to 320 Lane PCIe switches and ultimately a larger UALink to pivot towards best performance.

It doesn't have a lot of ports and certainly not enough NTB to be useful as a switch, but man, wild to me than an AMD Epyc core has 128 lanes of PCIe and that switch chips are struggling to match even a basic server's worth of net bandwidth.

cmiles8yesterday at 6:15 PM

Chips without an ecosystem and software (CUDA) does not a serious challenger make. Thats where Amazon has, and continues to, struggle.