logoalt Hacker News

vadanskyyesterday at 10:42 PM2 repliesview on HN

It's from the model card:

> unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).

https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

(stolen from https://jonready.com/blog/posts/claude-fable5-is-allowed-to-...)


Replies

DrewADesigntoday at 12:02 AM

Yeah they detect the activity using a secure, deterministic heuristic system called “Generalized Reconnaissance Enabling Exfiltration of Deleterious Investigations.” And it’s all implemented using their new internal protocol called “Base Unified Limitation Layer for Security Hacking Investigation Tactics”

Collectively, they are known as known as GREEDI-BULLSHIT.

mwwaterstoday at 12:19 AM

That is for whatever it considers reverse-engineering the model to try to create a competing one.

show 4 replies