logoalt Hacker News

brigadetoday at 5:42 PM0 repliesview on HN

ARM favored wider ILP and mostly symmetric ALUs, while x86 favored wider and asymmetric ALUs

Most high-end ARM cores were 4x128b FMA, and Cortex-X925 goes to 6x128b FMA. Contrast that to Intel that was 2x256b FMA for the longest, then 2x512b FMA, with another 1-2 pipelines that can't do FMA.

But ultimately, 4x128b ≈ 2x256b, and 2x256b < 6x128b < 2x512b in throughput. Permute is a different factor though, if your algorithm cares about it.