It’d be interesting to see this compared against a human baseline — e.g., a competent engineer with a fixed time budget on the same tasks.