'feel' is no more accurate not saying there's a better way but both suck

AstroBen • yesterday at 6:49 PM • 4 replies • view on HN

'feel' is no more accurate

not saying there's a better way but both suck

Replies

thethimble • yesterday at 7:16 PM

Speak for yourself. I've been insanely productive with Codex 5.2.

With the right scaffolding these models are able to perform serious work at high quality levels.

➕ show 2 replies

crorella • yesterday at 7:17 PM

The variety of tasks they can do and will be asked to do is too wide and dissimilar, it will be very hard to have a transversal measurement, at most we will have area specific consensus that model X or Y is better, it is like saying one person is the best coder at everything, that does not exist.

➕ show 1 reply

tavavex • yesterday at 7:34 PM

The 'feel' of a single person is pretty meaningless, but when many users form a consensus over time after a model is released, it feels a lot more informative than a simple benchmark because it can shift over time as people individually discover the strong and weak points of what they're using and get better at it.

forrestthewoods • yesterday at 7:39 PM

At the end of the day “feel” is what people rely on to pick which tool they use.

I’d feel unscientific and broken? Sure maybe why not.

But at the end of the day I’m going to choose what I see with my own two eyes over a number in a table.

Benchmarks are a sometimes useful to. But we are in prime Goodharts Law Territory.

➕ show 1 reply

alt Hacker News

Replies