Could you combine it with a classic OCR segmentation process, so that along with the image you also provide box coordinates of each string?