logoalt Hacker News

dmjetoday at 7:37 AM2 repliesview on HN

Great piece, well written and succinctly sums up my thoughts.

The bit I still don’t understand is how we all put up with the hallucinations. I was questioning Gemini last night about whether it could analyse a Fourtet song and give me a break down of the structure from beginning to end. “Sure!” it said with the endless enthusiasm you get from Gen tools, and then proceeded to spit out an absolute sack of fabricated shit. I pushed back, it apologised, and then generated more crap that had nothing to do with reality, I pushed back, we looped again, still just total fiction: “the drums don’t come in until bar 16” on a song that opens with a drum loop, that kind of crap.

We’re so so far away from tools here that are anywhere near being trustworthy and accurate. And yet we (including myself) are chunking out code after code. It’s so bizarre.

I’m guessing it’s that humans don’t have capacity to deal with this kind of scenario - it’s like having a junior staff member who is utterly incredible 90% of the time - completely convincing in their certainty and skill level, and then 10% of the time you catch them doing a shit in their desk drawer because they couldn’t be arsed to walk to the toilet. AI’s are basically sociopaths.


Replies

entropitoday at 8:21 AM

> We’re so so far away from tools here that are anywhere near being trustworthy and accurate. And yet we (including myself) are chunking out code after code. It’s so bizarre.

I think one more thing this whole LLM charade in the last few years has revealed is that no-one really cares. As long as it "looks" like it works, turns out, its all fine.

show 2 replies
Dansvidaniatoday at 8:36 AM

Doesn’t the article make the argument that since you can write tests this is not as much of a problem for code gen ?

Its arguable whether it is a foolproof solution (I don’t think so) but it definitely makes it look like you can build a harness around the stochastic machine that will validate the correctness of the generated randomness.

Monkeys and typewriters when you can quickly validate whether it’s Shakespeare or not is a costly but theoretically feasible scenario. No?

show 1 reply