This is true though. While we know what they do on a mechanistic level, we cannot reliably analyze why the model outputs any particular answer in functional terms without a heroic effort at the "arxiv paper" level.
that’s true of analyzing individual atoms in a combustion engine — yet I doubt you’d claim we don’t know how they work
also this went from “we can’t analyze” to “we can’t analyze reliably [without a lot of effort]” quite quickly
that’s true of analyzing individual atoms in a combustion engine — yet I doubt you’d claim we don’t know how they work
also this went from “we can’t analyze” to “we can’t analyze reliably [without a lot of effort]” quite quickly