You might be interested in work around mechanistic interpretability! In particular, if you're interested in how models handle out-of-distribution information and apply in-context learning, research around so-called "circuits" might be up your alley: https://www.transformer-circuits.pub/2022/mech-interp-essay
After a brief scan, I'm not competent to evaluate the essay by Chris Olah you posted.
I probably could get an LLM to do so, but I won't....