"that suggests agents aren't capable of reliably generalizing beyond their training data."
Yes? If they could, we would have a strong general intelligence by now and only few people are claiming this.