It's not even a little bit of a joke.
Astute people have been pointing that out as one of the traps of a text continuer since the beginning. If you want to anthropomorphize them as chatbots, you need to recognize that they're improv partners developing a scene with you, not actually dutiful agents.
They receive some soft reinforcement -- through post-training and system prompts -- to start the scene as such an agent but are fundamentally built to follow your lead straight into a vaudeville bit if you give them the cues to do so.
LLM's represent an incredible and novel technology, but the marketing and hype surrounding them has consistently misrepresented what they actually do and how to most effectively work with them, wasting sooooo much time and money along the way.
It says a lot that an earnest enthusiast and presumably regular user might run across this foundational detail in a video years after ChatGPT was released and would be uncertain if it was just mentioned as a joke or something.
I keep hearing this non sequitur argument a lot. It's like saying "humans just pick the next work to string together into a sentence, they're not actually dutiful agents". The non sequitur is in assuming that somehow the mechanism of operation dictates the output, which isn't necessarily true.
It's like saying "humans can't be thinking, their brains are just cells that transmit electric impulses". Maybe it's accidentally true that they can't think, but the premise doesn't necessarily logically lead to truth