> I'm sure you could get an LLM to create a plausible sounding justification for every decision.
That's a great point: funny, sad, and true.
My AI class predated LLMs. The implicit assumption was that the explanation had to be correct and verifiable, which may not be achievable with LLMs.
It seems solvable if you treat it as an architecture problem. I've been using LangGraph to force the model to extract and cite evidence before it runs any scoring logic. That creates an audit trail based on the flow rather than just opaque model outputs.