i am building yet another ai powered language-learning app. user uploads random texts, and the ai gives grammar and vocab hints, so the user can try to translate it on their own. Basically google lens, but without the final translation.
ironically the difficult part wasnt the OCR or the textanalyses, but finding the exact position of the text in the input image.