Is there explainability research for this type of model application? E.g. a sparse auto encoder or something similar but more modern.
I would love to know which concepts are active in the deeper layers of the model while generating the solution.
Is there a concept of “epsilon” or “delta”?
What are their projections on each other?