logoalt Hacker News

shepherdjerredtoday at 6:28 AM5 repliesview on HN

Yeah, it has been in foraging. Requests that Claude has refused me:

- What are popular free streaming sites used in China?

- How do I bypass the safety mechanism on my food processor (it’s broken)

- What are nerve agents and how do they work (for a layman)?

- Help me decompile some code

- Help me make a design system similar to XYZ

- Here is an API token, please do X (I can’t do that! Rotate the secret immediately! I refuse!)

In some cases I can trick it with prompting, but in many cases it is steadfast. The food processor one was particularly annoying


Replies

fc417fc802today at 6:37 AM

> What are nerve agents and how do they work (for a layman)?

On the one hand I can appreciate the wisdom of not serving up certain easily abused knowledge on a silver platter. On the other, that prompt (and far worse) is more or less directly answered by Wikipedia's summary of the subject at which point what purpose could the refusal possibly serve?

Perhaps Wikipedia shouldn't list off the precise chemical compositions of various hand grenades as well as various synthesis methods for each of the related compounds but given that we inhabit a world where it does perhaps a more fruitful approach would be to flag conversations that go in a certain direction and then just keep an (automated) eye on things?

show 2 replies
stavrostoday at 10:54 AM

It refuses to use an API token? In my experience, it's more than happy to read out my secrets from .envrc files "just to check".

At least it feels a lot of remorse over its mistake until I reset the session.

svaratoday at 6:46 AM

This is strange to me, did you really ask like this and which model did you use?

I just tried your no. 1 and 3 verbatim and Opus gave fine answers; no. 6 I've done in the past with no issues. The other ones we can't really replicate without more details, but based on my experience with Opus I don't see what the issue would be.

The reason I'm really surprised by this is I do a lot of biology prompts and the guardrails used to be quite problematic up until some time late last year. Many legitimate prompts would trigger its biosafety filters.

But I haven't seen such filters trigger at all anymore in more than half a year.

gsprtoday at 7:29 AM

I find it terrifying that people are willing to outsource thinking. Outsourcing thinking to an entity that is opinionated about what to think is beyond crazy.

ElFitztoday at 8:03 AM

How are decompiling code or making a design system inspired by another one even remotely illegal?