Smaller data is where it’s at when optimizing nowadays. Less bandwidth required and higher cache hit rate.
You can compute a ton per bit transferred from DRAM. On both CPUs and GPUs.