Much better to burn the area for multiple smaller units, its a bit more area for frontend handling, but worth it for the flexibility (see Apple's M-series chips vs intel avx*).
Yes and no. I think neon is undersized for today at 128bit registers -- if you're working with doubles for example, that's only two values per register, which is pretty anemic. Things like shuffles and other tricky bitops benefit from wider widths as well (see my other reply)
Yes and no. I think neon is undersized for today at 128bit registers -- if you're working with doubles for example, that's only two values per register, which is pretty anemic. Things like shuffles and other tricky bitops benefit from wider widths as well (see my other reply)