It was claimed to find that, but I don't think it did. It compared developers' beliefs about average speed up across tasks, measured by asking them once at the end, compared to the average comparative speed measured per task and then averaged. That's measuring two different things, and all kinds of things could mass up developers' fuzzy recollection of the gestalt of several tasks (such as recency bias and question/study framing) that wouldn't effect it if you asked them right after; moreover, when tasks were broken down by task type, the speed up/slow down results actually matched developers' qualitative comments.