This is a great example of how basic napkin math eliminates many classes of errors.
And most certainly we will see alien code. Already in speedrunning we see a pattern of “ahuman” behavior, where a learner optimized for a well-defined system begins to lose the implicit beauty of the system that draws humans to it. Before RL models became feasible for speedrunning, top runs were these apexes of performance, the tightest possible lines and utmost precision. But the RL speedruns use impossible strategies like nonstop tricks impossible to do with hands, and they lose a great deal of beauty in this manner, at least to some people.
Perhaps the greatest lesson we can derive from this example is that the improvements are still marginal compared to top players. A casual player might beat a track in 1:10, the world record might be 1:00, and the RL record might be 0:50. So we still see significant yet undeniably marginal improvements in performance.
I suppose soon enough we will have experimental evidence for all these ideas!