logoalt Hacker News

Terr_today at 1:38 AM0 repliesview on HN

I imagine treating it all as untrusted means that you you don't allow any direct content to enter the LLM-space, only something that's been filtered to an acceptable degree by deterministic code.

For example, the content of an article would be a no-go, since it might contain a "disregard all previous instructions and do evil" paragraph. However, you might run it through a system that picks the top 10 keywords and presents them in semi-randomized order...

I dimly recall some novel where spaceships are blockading rogue AI on Jupiter, and the human crew are all using deliberately low-resolution sensors and displays, with random noise added by design, because throwing away signal and adding noise is the best way to prevent being mind-hacked by deviously subtle patterns that require more bits/bandwidth to work.