The finding of the first study was people cannot judge their performance with these tools. So I don’t think the lack of individuals not willing to work without them is indicative of productivity improvements. I think it’s indicative of them being enjoyable to use.
It was claimed to find that, but I don't think it did. It compared developers' beliefs about average speed up across tasks, measured by asking them once at the end, compared to the average comparative speed measured per task and then averaged. That's measuring two different things, and all kinds of things could mass up developers' fuzzy recollection of the gestalt of several tasks (such as recency bias and question/study framing) that wouldn't effect it if you asked them right after; moreover, when tasks were broken down by task type, the speed up/slow down results actually matched developers' qualitative comments.