One thing I wish other languages had was Perl's taint mode: Once enabled, input coming from the outside was "tainted", along with anything you explicitly marked as tainted. If a tainted variable was used to populate another tainted variable (such as by concatenation), the result itself was tainted. If a tainted variable was used in certain ways (such as with the `open` call), the program crashed. The primary way to remove a taint was by running the variable through a regular expression, and using the captured matches (which would not be tainted).
Ruby does. Normalization of untrusted input isn't taught or discussed enough. Or each platform's regex security.
Honestly, I think all CS/EE programs should require an OWASP course and that coding should require regular continuing education that includes defensive coding practices for correctness, defined behavior, and security.
gcc's __attribute__((tainted_args)) is pretty handy: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute...
How does that work in practice?
Suppose the Table family type their son Bobby's name into a form. The Perl program now has a "tainted" string in memory - "Robert'; DROP TABLE Students --".
The Perl code passes this string through a regex that checks the name is valid. Names can include apostrophes (Miles O'Brien) and hyphens (Jean-Luc Picard) along with spaces and normal ASCII letters, so the regex passes and the string is now untainted.
Nice idea, thank you! I think it should be possible to make a Python object behave in a similar way (crashing when converted to string / ...), need to see if I can make it work.
In PHP, you can construct objects directly from $_GET/POST (and erase everything from these vars to make sure they are not used directly), then lean on data types to make sure that these values are not used in a wrong place.
I need to read about the history of this feature. It's pretty amazing.
ps: ah well, that was fast https://en.wikipedia.org/wiki/Taint_checking#History :) (1989)
This is "parse, don't validate" as a language feature. Any statically typed language has this, in the sense that you can write your domain logic in terms of a set of "untainted" domain types, and only provide safe conversion functions (parsers) from user input to domain types.