Given the conduct we've seen to date, I'd trust them to follow the letter - but not the spirit - of IP law.
There may very well be clever techniques that don't require directly training on the users' data. Perhaps generating a parallel paraphrased corpus as they serve user queries - one which they CAN train on legally.
The amount of value unlocked by stealing practically ~everyone's lunch makes me not want to put that past anyone who's capable of implementing such a technology.