With Claude specifically I've grown confident they have been sneakily experimenting with context compression to save money and doing a very bad job at it. However for this same reason one shot batch usage or one off questions & answers that don't depend on larger context windows don't seem to see this degradation.
They added a "How is claude doing?" rating a while back which backs this statement up imo. Tons of A/B tests going on i bet.