I like Claude Code a lot, I think it sets a dangerous precedent to put guardrails in that return a r...

Avicebron • today at 4:34 PM • 10 replies • view on HN

I like Claude Code a lot, I think it sets a dangerous precedent to put guardrails in that return a response from a prompt that was modified by the system in real time in order to subvert the original intent.

Fail cleanly. Anything else makes it too difficult to rely on.

edit: Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word. But the EA thing is really leaking through, and paternalism isn't a good look.

Replies

bs7280 • today at 4:51 PM

I think the reasonable middle ground anthropic is trying to achieve is - let the organizations that make the most important and critical software get a head start on cybersecurity before they inevitably allow everyone else the same access.

Other commentors have made good points that these guardrails are counter productive for well intentioned cyber security, because I can't use it to test and harden my own software.

➕ show 4 replies

mapontosevenths • today at 4:40 PM

I agree 100%. Doing a worse job IS an error. It should be treated as such. Or at the very least make that behavior opt-in. The default should not be pretending like nothing happened and just quietly doing a worse job.

Imagine your healthcare provider just sometimes decided not to read your test results very carefully and you risked death? Now realize that healthcare providers use Claude now and that scenario wasn't hypothetical.

➕ show 1 reply

Paracompact • today at 6:27 PM

> Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word.

Only in the same sense that Standard Oil considered themselves the stewards of petroleum. There's benefit of the doubt and then there's just fanfiction. Do not forget that this most aggressive "guardrail" of theirs was not for any safety reason, but just to stop other labs from catching up to their product. They care less about hindering bioweapons, malware, and hate speech than they do free market competition.

jstummbillig • today at 5:05 PM

> paternalism isn't a good look.

In isolation it's not, but I think it's somewhat lazy to not talk about what they are trying to guard against, when we are supposedly giving the absolute maximum benefit of doubt.

Are we just concluding "their concerns were never real"? Because that probably runs counter the things that they have been observing and concluding.

➕ show 5 replies

hootz • today at 4:43 PM

What is "EA" in this context? I see a lot of people using this initialism.

➕ show 4 replies

tacone • today at 6:09 PM

That also means people are paying money to execute a prompt they've (partially) written.

joe_the_user • today at 5:09 PM

The problem is that Anthropic seems to be working up to the workflow one would naively want from AGI/some-god-like-entity.

The workflow would be; User asks for a thing. If it's a good thing, entity does the thing. If it's a naively bad idea, entity explains why you don't want that. If it's an actually evilly intended request, entity wags it's metaphorical finger or could even smite the user.

The problem is that flow isn't desirable if your entity isn't entirely god-like. It can bad even your entity is in ways rather far seeing.

➕ show 1 reply

cvadict • today at 4:51 PM

> Fail cleanly.

This is the same exact industry that gives you paid usage limits as a unit-less percentage bar then gaslights customers every time the algorithm running that percentage bar changes or they lobotomize an existing model with increased quantization to squeeze a few more dollars out of existing hardware.

"Failing cleanly" might make their moated hype-machine look bad pre-IPO, so they certainly aren't going to do that voluntarily.

thinkingtoilet • today at 5:33 PM

Was it modifying the prompt? I thought it only kicked the request down to 4.8.

alt Hacker News

Replies