I felt like it was about 10X better at "pretty" but straightforward 1 shot'ish type tasks. Not so different for complex and specific tasks in real code-bases.
Why do you say it was a lot better, what type of tasks were you testing it on?
> I felt like it was about 10X better at "pretty" but straightforward 1 shot'ish type tasks. Not so different for complex and specific tasks in real code-bases.
What metric are you using for "better" here? If I've got a straightforward task GPT 5.5 is going to 1shot it anyway.
> I felt like it was about 10X better at "pretty" but straightforward 1 shot'ish type tasks. Not so different for complex and specific tasks in real code-bases.
What metric are you using for "better" here? If I've got a straightforward task GPT 5.5 is going to 1shot it anyway.