Are there any such instructions with 16-bit output? I'm looking for fast addition and subtraction of 16-bit integer vectors