Not a lawyer, but my understanding is there is a strong feeling that AGPL can be roughly ignored if a service provider provides some level of indirection (e.g. a proxy) between the user and the software. Then, the software is somehow not being accessed over a network and thus they are not required to release the source.
Not just a level of indirection. The "substantial features" of the code need to not be directly exposed.
So if you had some AGPL OCR tool you were using, you could use it, but not in a way the user sees that text. Generate audio from it and expose the sound? Probably fine.
I have a strong feeling that speaking to a lawyer might reveal that to be untrue