logoalt Hacker News

bdashtoday at 1:26 AM1 replyview on HN

There are SME / SME2 instructions that use the ZA tiles as vector registers / vector groups. These can take advantage of the higher throughput of the SME processing grid vs SSVE instructions that operate on Z registers. See the `FMLA (SME2)` case under Peak Performance at https://scalable.uni-jena.de/opt/sme/micro.html#peak-perform....


Replies

anematodetoday at 7:51 AM

Are there any such instructions with 16-bit output? I'm looking for fast addition and subtraction of 16-bit integer vectors