They are going to do the same thing they do with code.
They are going to hire armies of developing world workers to massage those models on post-training to have some acceptable behaviors, and they will create the appropriate agents with the appropriate tools to have something that will simulate the real thing in a most plausible way.
Problem is, RLVR is cheap with code, but it can get very expensive with human physiology.