The problem is that LLMs stop working after a certain point of complexity or specificity, which is very obvious once you try to use it in a field you have deep understanding of. At this point, your own skills should be able to carry you forward, but if you've been using an LLM to do things for you since the start, you won't have the necessary skills.
Once they have to solve a novel problem that was not already solved for all intentes and purposes, Alice will be able to apply her skillset to that, whereas Bob will just run into a wall when the LLM starts producing garbage.
It seems to me that "high-skill human" > "LLM" > "low-skill human", the trap is that people with low levels of skills will see a fast improvement of their output, at the hidden cost of that slow build-up of skills that has a way higher ceiling.
This whole argument can be made for why every programmer needs to deeply understand assembly language and computer hardware.
At a certain point, higher level languages stop working. Performance, low level control of clocks and interrupts, etc.
I’m old enough dropping into assembly to be clever with the 8259 interrupt controller really was required. Programmers today? The vast majority don’t really understand how any of that works.
And honestly I still believe that hardware-up understanding is valuable. But is it necessary? Is it the most important thing for most programmers today?
When I step back this just reads like the same old “kids these days have it so easy, I had to walk to school uphill through the snow” thing.
Then test Bob on what you actually want him to produce, ie novel problems, instead of trivial things that won't tell you how good he is.
Why is it a problem of the LLM if your test is unrelated to the performance you want?