That makes a ton of sense and aligns with my observations. Thanks for the resource :) If SSVE is s...

anematode • today at 12:10 AM • 2 replies • view on HN

That makes a ton of sense and aligns with my observations. Thanks for the resource :)

If SSVE is slow, I was hoping that SME instructions could be used in a vector-like fashion (e.g. add two matrices with high throughput, or a Hadamard/element-wise product) but it seems most matrix accelerator ISAs don't have that.

Replies

bdash • today at 1:26 AM

There are SME / SME2 instructions that use the ZA tiles as vector registers / vector groups. These can take advantage of the higher throughput of the SME processing grid vs SSVE instructions that operate on Z registers. See the `FMLA (SME2)` case under Peak Performance at https://scalable.uni-jena.de/opt/sme/micro.html#peak-perform....

➕ show 1 reply

alt Hacker News

Replies