logoalt Hacker News

nonethewiseryesterday at 6:50 PM1 replyview on HN

I wonder what hooks they have in place to be able to configure safeguards at runtime.


Replies

aleksiy123yesterday at 7:00 PM

Probably a mix of heuristics, keywords and simple ml model.

Then maybe a second gate with a lightweight llm?

Edit: actually Gcp, azure, and OpenAI all have paid apis that you can also use.

But I don’t think they go into details about the exact implementation https://redteams.ai/topics/defense-mitigation/guardrails-arc...

show 1 reply