as the context fills up, the model will generate based on that context, incl. whatever illegal stuff you've said, i.e. it'll mimic that, instead of whatever safety prompt they have at the top
they could make it more "safe" but it'd be much more invasive and would likely have to scan much more tokens also, and it'd cause false positives (probably the biggest reason it's not implemented)