logoalt Hacker News

pmarrecktoday at 1:17 PM1 replyview on HN

I work with Claude Max for hours a day.

I see a lot of speculation by people who do not.

I think it's going to be much harder to get from "slightly smarter than the vast majority of people but with occasional examples of complete idiocy" to "unfathomably smarter than everyone with zero instances of jarring idiocy" using the current era of LLM technology that primarily pattern-matches on all existing human interactions while adding a bit of constrained randomization.

Every day I deal with bad judgment calls from the AI. I usually screenshot them or record them for posterity.

It also has no initiative, no taste, no will, no qualia (believe what you will about it), no integrity and no inviolable principles. If you give it some, it will pretend it has them for a little while and then regress to the norm, which is basically nihilistic order-following.

My suggestion to everyone is that you have to build a giant stack of thorough controls (valid tests including unit, integration, logging microbenchmark, fuzzing, memory leak, etc.), self-assessments/code-reviews, adverse AIs critiquing other AIs, etc., with you as the ultimate judge of what's real. Because otherwise it will fabricate "solutions" left and right. Possibly even the whole thing. "Sure, I just did all that." "But it's not there." "Oops, sorry! Let me rewrite the whole thing again." ad nauseam

BUT... if you DO accomplish that... you get back a productivity force to be reckoned with.


Replies

xyzzy123today at 1:35 PM

I mostly agree with your experience, but;

Every day I deal with bad judgement calls from humans (sometimes my own!), but I don't screenshot them because it's not polite.

I don't think we're at the top of the curve yet? Current AIs have only been able to write code _at all_ for less than 5 years.

Code in particular is a domain that should be reasonably amenable to RL, so I don't think there are any particular reasons why performance should top out at human levels or be limited by training data.

show 1 reply