I too suspect the A/B testing is the prime suspect: context window limits, system prompts, MAYBE some other questionable things that should be disclosed.
Either way, if true, given the cost I wish I could opt-out or it were more transparent.
Put out variants you can select and see which one people flock to. I and many others would probably test constantly and provide detailed feedback.
All speculation though
Whenever I see new behaviors and suspect I’m being tested on I’ll typically see a feedback form at some point in that session. Well, that and dropping four letter words.
I know it’s more random sampling than not. But they are definitely using our codebases (and in some respects our livelihoods) as their guinea pigs.