I don't think LLMs are that chaotic, you can replace words in an input at get a similar answer, and they are very good at dealing with typos.
They are definitely not interpretable, I was reading some stuff from mechanistic interpretability researchers saying they've given up trying to build a bottom up model of how they work.
> I don't think LLMs are that chaotic, you can replace words in an input at get a similar answer, and they are very good at dealing with typos.
Compare "You are a helpful assistant. Your task is to <100 lines of task description> <example problem>"
with
"you are a helpless assistant. Your task is to <100 lines of task description> <example problem>"
I've changed 3 or 4 CHARACTERS ("ful" to "less") out of a (by construction) 1000+ character prompt.
and the outputs are not at all similar.
Just realized I've never tried the "you are a helpless ass" prompt. Again a very minor change in wording, just dropping a few letters. The helpless assistant at least output text apologizing for being so bad at the task.