lovely application!
Genuine question: why not use (Modern)BERT instead for classification? (Is the json-output explanation so critical?)