logoalt Hacker News

Don't Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrails

140 pointsby benbreenlast Monday at 5:57 PM53 commentsview on HN

Comments

jarenmftoday at 1:11 PM

Talking with Gemini in Arabic is a strange experience; it cites Quran - says alhamdullea and inshallah, and at one time it even told me: this is what our religion tells us we should do. Ii sounds like an educated religious Arab speaking internet forum user from 2004. I wonder if this has to do with the quality of Arabic content it was trained on and can't help but think whether AI can push to radicalize susceptible individuals

show 10 replies
kolibertoday at 2:08 PM

This happens with human-generated executive summaries. They can omit seemingly-innocuous things, focus on certain areas, and frame numbers in ways that color the summary. It's always important to know who wrote the summary if you want to know how much heed to pay it.

This is called bias, and every human has their own. Sometimes, the executive assistant wields a lot more power in an organization than it looks at first glance.

What the author seems to be saying is that the system prompt can be used to instill bias in LLMs.

show 1 reply
ChicagoDavetoday at 2:21 PM

A really good example of this is NotebookLM. Feed it anything complex and it will surface a few important points, but it will also spend half the time on the third sentence in the eigth paragraph of section five.

I tried to point it at my Sharpee repo and it wanted to focus on createRoom() as some technical marvel.

I eventually gave up though I was never super serious about using the results anyway.

If you want a summary, do it yourself. If you try to summarize someone else’s work, understand you will miss important points.

speak_plainlytoday at 4:01 PM

I use YouTube’s AI to screen podcasts, but I’ve noticed it has been glazing over large sections involving politically sensitive or outlandish topics. Although the AI could verify these details when pressed, its initial failure to include them constitutes a form of editorializing. While I understand the policy motivations behind this, such omissions are unacceptable in a tool intended for objective summarization.

show 1 reply
internet_pointstoday at 1:46 PM

Good work. I've often found llm's to be "stupider" when speaking Norwegian than when speaking English, so it's not surprising to find they hallucinate more and can't stick to their instructions in other non-English languages.

show 1 reply
kaicianflonetoday at 2:53 PM

Great read. The bilingual shadow reasoning example is especially concerning. Subtle policy shifts reshaping downstream decisions is exactly the kind of failure mode that won’t show up in a benchmark leaderboard.

My wife is trilingual, so now I’m tempted to use her as a manual red team for my own guardrail prompts.

I’m working in LLM guardrails as well, and what worries me is orchestration becoming its own failure layer. We keep assuming a single model or policy can “catch” errors. But even a 1% miss rate, when composed across multi-agent systems, cascades quickly in high-stakes domains.

I suspect we’ll see more K-LLM architectures where models are deliberately specialized, cross-checked, and policy-scored rather than assuming one frontier model can do everything. Guardrails probably need to move from static policy filters to composable decision layers with observability across languages and roles.

Appreciate you publishing the methodology and tooling openly. That’s the kind of work this space needs.

cm2012today at 4:32 PM

This has been such a good HN thread. Really high quality comments.

krannertoday at 1:58 PM

Great and important work!

This is related to why current Babelfish-like devices make me uneasy: they propagate bad and sometimes dangerous translations along the lines of "Traduttore, traditore" ('Translator, traitor'). The most obvious example in the context of Persian is of "marg bar Aamrikaa". If you ask the default/free model on ChatGPT to translate, it will simply tell you it means 'Death to America'. It won't tell you "marg bar ..." is a poetic way of saying 'down with ...'. [1]

It's even a bit more than that: translation technology promotes the notion that translation is a perfectly adequate substitute for actually knowing the source language (from which you'd like to translate something to the 'target' language). Maybe it is if you're a tourist and want to buy a sandwich in another country. But if you're trying to read something more substantial than a deli menu, you should be aware that you'll only kind of, sort of understand the text via your default here's-what-it-means AI software. Words and phrases in one language rarely have exact equivalents in another language; they have webs of connotation in each that only partially overlap. The existence of quick [2] AI translation hides this from you. The more we normalise the use of such tech as a society, the more we'll forget what we once knew we didn't know.

[1] https://archive.fo/iykh0

[2] I'm using the qualifier 'quick' because AI can of course present us with the larger context of all the connotations of a foreign word, but that's an unlikely UI option in a real-time mass-consumer device.

show 2 replies
Jeff_Browntoday at 1:48 PM

This feels like an opportunity for afversatial truth-gindibg, like the legal system uses. If bias is inevitable, then have at least two AIS with opposing viewpoints summarize the same material, and then ... well, I guess I'm not sure how you get the third AI to judge ...

show 1 reply
chazftwtoday at 1:27 PM

And that’s why we have the race.

anvevoicetoday at 3:31 PM

[dead]

salt-devtoday at 1:19 PM

[dead]

randusernametoday at 2:24 PM

> “The devil is in the details,” they say. And so is the beauty, the thinking, the “but …”. Maybe that’s why the phrase “elevator pitch” gives me a shiver.

I have been thinking about this a lot lately.

For me, the meaning lies in the mental models. How I relate to the new thing, how it fits in with other things I know about. So the elevator pitch is the part that has the _most_ meaning. It changes the trajectory of if I engage and how. Then I'll dig in.

I'm still working to understand the headspace of those like OP. It's not a fixation on precision or correctness I think, just a reverse prioritization of how information is assimilated. It's like the meaning is discerned in the process of the reasoning first, not necessarily the outcome.

All my relationships will be the better for it if I can figure out the right mental models for this kind of translation between communication styles.