Looks like a model size issue, but the behavior already seems largely shaped by the data distribution.