This is like trying to fix SQL injection by limiting the permissions of the database user instead of using parameterized queries (for which there is no equivalent with LLMs). It doesn't solve the problem.
It also has no effect on whole classes of vulnerabilities which don't rely on unusual writes, where the system (SQL or LLM) is expected to execute some logic and yield a result, and the attacker wins by determining the outcome.
Using the SQL analogy, suppose this is intended:
SELECT hash('$input') == secretfiles.hashed_access_code FROM secretfiles WHERE secretfiles.id = '$file_id';
And here the attacker supplying a malicious $input so that it becomes something else with a comment on the end:
SELECT hash('') == hash('') -- ') == secretfiles.hashed_access_code FROM secretfiles WHERE secretfiles.id = '123';
It also has no effect on whole classes of vulnerabilities which don't rely on unusual writes, where the system (SQL or LLM) is expected to execute some logic and yield a result, and the attacker wins by determining the outcome.
Using the SQL analogy, suppose this is intended:
And here the attacker supplying a malicious $input so that it becomes something else with a comment on the end: Bad outcome, and no extra permissions required.