I don't know if data leakage is the right word, but maybe overfitting if they took a 1 hour clip from same place and used 90 percent for training and 10 percent for eval/test?
It is still decent way to start I think, but it needs to get more varied data after that and use different geographical locations for eval and test.