Ah you're worried about convergence issues? My (Bad) understanding was that the self-driving car stuff is more about inadequacies of models in which you simulate training and data collection than convergence of algorithms but I could be wrong. I mean that statement was just a statement that I think you can get RL to converge to close to optimum--which I agree is a bit of a stretch as RL is famously finicky. But I don't see why one shouldn't expect this to happen as we tune the algorithms.