It's from the model card: > unlike our interventions for cybersecurity, biology and chemis...

vadansky • yesterday at 10:42 PM • 2 replies • view on HN

It's from the model card:

> unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).

https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

(stolen from https://jonready.com/blog/posts/claude-fable5-is-allowed-to-...)

Replies

DrewADesign • today at 12:02 AM

Yeah they detect the activity using a secure, deterministic heuristic system called “Generalized Reconnaissance Enabling Exfiltration of Deleterious Investigations.” And it’s all implemented using their new internal protocol called “Base Unified Limitation Layer for Security Hacking Investigation Tactics”

Collectively, they are known as known as GREEDI-BULLSHIT.

mwwaters • today at 12:19 AM

That is for whatever it considers reverse-engineering the model to try to create a competing one.

➕ show 4 replies

alt Hacker News

Replies