I knew these system prompts were getting big, but holy fuck. More than 60,000 words. With the 3/...

sigmoid10 • today at 12:36 PM • 6 replies • view on HN

I knew these system prompts were getting big, but holy fuck. More than 60,000 words. With the 3/4 words per token rule of thumb, that's ~80k tokens. Even with 1M context window, that is approaching 10% and you haven't even had any user input yet. And it gets churned by every single request they receive. No wonder their infra costs keep ballooning. And most of it seems to be stable between claude version iterations too. Why wouldn't they try to bake this into the weights during training? Sure it's cheaper from a dev standpoint, but it is neither more secure nor more efficient from a deployment perspective.

Replies

an0malous • today at 12:51 PM

I’m just surprised this works at all. When I was building AI automations for a startup in January, even 1,000 word system prompts would cause the model to start losing track of some of the rules. You could even have something simple like “never do X” and it would still sometimes do X.

➕ show 2 replies

mysterydip • today at 12:40 PM

I assume the reason it’s not baked in is so they can “hotfix” it after release. but surely that many things don’t need updates afterwards. there’s novels that are shorter.

➕ show 2 replies

jatora • today at 12:55 PM

There are different sections in the markdown for different models. It is only 3-4000 words

winwang • today at 12:44 PM

That's usually not how these things work. Only parts of the prompt are actually loaded at any given moment. For example, "system prompt" warnings about intellectual property are effectively alerts that the model gets. ...Though I have to ask in case I'm assuming something dumb: what are you referring to when you said "more than 60,000 words"?

➕ show 2 replies

formerly_proven • today at 12:46 PM

Surely the system prompt is cached across accounts?

➕ show 2 replies

cma • today at 1:46 PM

> And it gets churned by every single request they receive

It gets pretty efficiently cached, but does eat the context window and RAM.

alt Hacker News

Replies