Can you? I imagine e.g. Google is using material not available to the public to train their models (unsencored Google books, etc.). Also, the chat bots, like Gemini, are not just pure LLMs anymore, but they also utilize other tools as part of their computation. I've asked Gemini computationally heavy questions and it successfully invokes Python scripts to answer them. I imagine it can also use other tools than Python, some of which might not even be publicly known.
I'm not sure what the situation is currently, but I can easily see private data and private resources leading to much better AI tools, which can not be matched by open source solutions.
While they will always have premiere models that only run on data center hardware at first, the good news about the tooling is that tool calls are computationally very minimal and no problem to sandbox/run locally, at least in theory, we would still need to do the plumbing for it.
So I agree that open source solutions will likely lag behind, but that's fine. Gemini 2.5 wasn't unusable when Gemini 3 didn't exist, etc.