The quality variation from month to month has been my experience too. I've noticed the models seem to "forget" conventions they used to follow reliably - like proper error handling patterns or consistent variable naming.
What's strange is sometimes a fresh context window produces better results than one where you've been iterating. Like the conversation history is introducing noise rather than helpful context. Makes me wonder if there's an optimal prompt length beyond which you're actually degrading output quality.
Remember that the entire conversation is literally the query you’re making, so the longer it is the more you’re counting on the rational comprehension abilities of the AI to follow it and determine what is most relevant.
> Like the conversation history is introducing noise rather than helpful context.
From https://docs.github.com/en/copilot/concepts/prompting/prompt...:
Copilot Chat uses the chat history to get context about your request. To give Copilot only the relevant history:
- Use threads to start a new conversation for a new task
- Delete requests that are no longer relevant or that didn’t give you the desired result