SIMD performance in modern Intel and AMD cpus is so bad that it is useless outside very specific circumstances.
This is mainly because vector instructions are implemented by sharing resources with other parts of the CPU and more or less stalls pipelines, significantly reduces ipc, makes out of order execution ineffective.
The shared resources are often involve floating point registers and compute, so it's a double whammy.
SIMD performance in modern Intel and AMD cpus is so bad that it is useless outside very specific circumstances.
This is mainly because vector instructions are implemented by sharing resources with other parts of the CPU and more or less stalls pipelines, significantly reduces ipc, makes out of order execution ineffective.
The shared resources are often involve floating point registers and compute, so it's a double whammy.