> What I never understand is the population of coders that don’t see any value in coding agents o...

bigstrat2003 • yesterday at 8:38 PM • 3 replies • view on HN

> What I never understand is the population of coders that don’t see any value in coding agents or are aggressively against them, or people that deride LLMs as failing to be able to do X (or hallucinate etc) and are therefore useless and every thing is AI Slop, without recognizing that what we can do today is almost unrecognizeable from the world of 3 years ago.

I don't recognize that because it isn't true. I try the LLMs every now and then, and they still make the same stupid hallucinations that ChatGPT did on day 1. AI hype proponents love to make claims that the tech has improved a ton, but based on my experience trying to use it those claims are completely baseless.

Replies

ben_w • yesterday at 9:11 PM

> I try the LLMs every now and then, and they still make the same stupid hallucinations that ChatGPT did on day 1.

One of the tests I sometimes do of LLMs is a geometry puzzle:

  You're on the equator facing south. You move forward 10,000 km along the surface of the Earth. You are rotate 90° clockwise. You move another 10,000 km forward along the surface of the earth. Rotate another 90° clockwise, then move another 10,000 km forward along the surface of the Earth.

  Where are you now, and what direction are you facing?

They all used to get this wrong all the time. Now the best ones sometimes don't. (That said, only one to succed just as I write this comment was DeepSeek; the first I saw succeed was one of ChatGPT's models but that's now back to the usual error they all used to make).

Anecdotes are of course a bad way to study this kind of thing.

Unfortunately, so are the benchmarks, because the models have quickly saturated most of them, including traditional IQ tests (on the plus side, this has demonstrated that IQ tests are definitely a learnable skill, as LLMs loose 40-50 IQ points when going from public to private IQ tests) and stuff like the maths olympiad.

Right now, AFAICT the only open benchmarks are the METR time horizon metric, the ARC-AGI family of tests, and the "make me an SVG of ${…}" stuff inspired by Simon Willison's pelican on a bike.

➕ show 1 reply

hectdev • yesterday at 8:55 PM

This fascinates me. Just observing but because it hasn't worked for you, everyone else must be lying? (I'm assuming that's what you mean by baseless)

How does that bridge get built? I can provide tangible real life examples but I've found push back from that in other online conversations.

➕ show 1 reply

shepherdjerred • yesterday at 10:00 PM

What have you tried? How much time have you spent? Using AI is it’s own skill set separate from programming

alt Hacker News

Replies