Well what you said is:
"On the contrary, I believe in every verifiable domain RL must drive the agent to be the most intelligent (relative to RL award) it can be under the constraints--and often it must become more intelligent than humans in that environment."
And I said it's not that simple, in no way demonstrated, unlikely with current technology, and basically, nope.
Ah you're worried about convergence issues? My (Bad) understanding was that the self-driving car stuff is more about inadequacies of models in which you simulate training and data collection than convergence of algorithms but I could be wrong. I mean that statement was just a statement that I think you can get RL to converge to close to optimum--which I agree is a bit of a stretch as RL is famously finicky. But I don't see why one shouldn't expect this to happen as we tune the algorithms.