FWIW I built a text classification tool for internal use using (at this point 1 year old) frontier models and found that asking for reasoning significantly increased precision and recall.