I’m working on Security Level 5 (SL5), which is basically “nation state grade security” for frontier AI systems. The core idea is that if a model’s weights or training artifacts could enable catastrophic harm, you should treat them like top-tier secrets and secure them accordingly.
One piece I helped with is SenL, a “sensitivity level” framework for AI labs. It’s like a practical clearance system for AI assets. Not everything in a lab is equally dangerous, so you label assets by sensitivity (weights, training data, eval sets, agent tooling, deployment configs, etc.), then tie that label to concrete controls like who can access it, where it can run, what logging is required, and what monitoring / two-person rules apply.
If anyone’s curious, SL5 is here: http://sl5.org/ and the SenL framework is part of the published artifacts.
> The core idea is that if a model’s weights or training artifacts could enable catastrophic harm, you should treat them like top-tier secrets and secure them accordingly.
My read here is that you're implying that if an attacker has access to, for example, weight data, they can invariably find a way to exploit it.
If that's a correct assumption, I think you're playing an unwinnable game, since attackers always have indirect access through inference of the model. It feels like locking down weights/training data/etc is the ai version of security through obfuscation.
Just my 2c, for what it's worth