I think people working on these abstractions would claim that an appropriate abstraction does involve the exact features you mention. Cache hierarchies and vectorization are common to CPUs and GPUs - in some sense, the numbers parametrizing them are all that differs. With a good abstraction, this machine-tuning can be automated.