How are you qualified to judge its performance on real code if you don't know how to write a he...

zarzavat • last Wednesday at 5:33 AM • 3 replies • view on HN

How are you qualified to judge its performance on real code if you don't know how to write a hello world?

Yes, LLMs are very good at writing code, they are so good at writing code that they often generate reams of unmaintainable spaghetti.

When you submit to an informatics contest you don't have paying customers who depend on your code working every day. You can just throw away yesterday's code and start afresh.

Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash.

Replies

josu • last Wednesday at 3:29 PM

I know what's like running a business, and building complex systems. That's not the point.

I used highload as an example because it seems like an objective rebuttal to the claim that "but it can't tackle those complex problems by itself."

And regarding this:

"Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash"

Again, a combination of LLM/agents with some guidance (from someone with no prior experience in this type of high performing architecture) was able to beat all human software developers that have taken these challenges.

VMG • last Wednesday at 2:01 PM

> Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash.

The skill of "a human software developer" is in fact a very wide distribution, and your statement is true for a ever shrinking tail end of that

FeepingCreature • last Wednesday at 6:12 AM

> How are you qualified to judge its performance on real code if you don't know how to write a hello world?

The ultimate test of all software is "run it and see if it's useful for you." You do not need to be a programmer at all to be qualified to test this.

➕ show 1 reply

alt Hacker News

Replies