Im curious to see what that would look like. It’s like inception, how many levels deep can you create a prompt that hijacks all the way up.
Modern OS exploit chains should give you a good sense of how far people can go. (Eg, phone OSes are relatively hardened.)
We’re not even at the “ASLR” level of protection for LLMs yet.
Modern OS exploit chains should give you a good sense of how far people can go. (Eg, phone OSes are relatively hardened.)
We’re not even at the “ASLR” level of protection for LLMs yet.