>There is no such things as parallel speech data
Which is why translation models such as the one from the article are no longer trained that way.