Can you help me understand where you are coming from? Is it that you think the benchmark is flawed or overly harsh? Or that you interpret the tone as blaming AI for failing a task that is inherently tricky or poorly specified?
My takeaway was more "maybe AI coding assistants today aren’t yet good at this specific, realistic engineering task"....
Can you help me understand where you are coming from? Is it that you think the benchmark is flawed or overly harsh? Or that you interpret the tone as blaming AI for failing a task that is inherently tricky or poorly specified?
My takeaway was more "maybe AI coding assistants today aren’t yet good at this specific, realistic engineering task"....