When classifying resumes it is better to use the LLM as a feature extractor, think of 10-20 features you base your decision on, and extract them by LLM. The LLM only needs to do lower level task of question answering. Then you fit a classical ML model (xgboost for example) on the extracted features, based on company triage data points. This way you don't rely on the biases in the model, you can decide what criteria to use and how to judge cases without retraining the LLM. The feature extractor is generic, and the actual triage model is a toy you can retrain in seconds on new data points. It is also much more explainable, you can see how features influence decisions.
I'd rather my employers just does the classic of shredding random 80% and looking at the remainder properly.