That's because AI labs keep stamping out the widely known failures. I assume without actually retraining the main model, but with some small classifier that detects the known meme questions and injects correct answer in the context.
But try asking your favorite LLM what happens if you're holding a pen with two hands (one at each end) and let go of one end.
https://chatgpt.com/s/t_69bcbeeaa2f081918113f42940803007
Seems fine to me?