data bandwidth limits distributed training under current architectures. really interesting implications if we can make progress on that