try using codex-5.3-spark, it has much faster inference, might be able to keep up. and maybe a specialized different openrouter model for visual parsing.