Yes, you're right that interpreters also allow users to run code. And we could argue that Apple is simply being inconsistent in how it applies its policies. I think realistically the difference is in how popular these newer vibe coding apps are, but also the fact that they have a much broader scope of what can be generated.
With Pythonista or a Lua-scripted game, the reviewer can assess what's possible: this app can do everything Python-with-this-API-surface can do, and nothing more.
With LLM-driven generation, the set of possible behaviors isn't fixed. The same Replit app can produce totally different behaviors next month than it can today, without ever being resubmitted, based on model or system prompt updates.
That's what I meant with "you can't review adaptive software".