logoalt Hacker News

snugyesterday at 11:33 PM2 repliesview on HN

I think this can be great as additional layer of security. Where you can have a non llm layer do some analysis with some static rules and then if something might seem phishy run it through the llm judge so that you don’t have to run every request through it, which would be very expensive.

Edit: actually looks like it has two policy engines embedded


Replies

windexh8eryesterday at 11:42 PM

And we don't think the judge can/will be gamed? Also... It's an LLM, it's going to add delay and additional token burn. One subjective black box protecting another subjective black box. I mean, what couldn't go wrong?

ImPostingOnHNyesterday at 11:42 PM

What happens when a prompt injection attack exploits the judge LLM and results in a higher level of attacker control than if it never existed?

show 1 reply