This was for video games (console, pc, mobile) but we had a framework where a video feed went into a host machine and you could get it to ocr read areas off the screen. This was then used to guide user input (i.e. find the right button if it had been moved) but also had checks for if strings exceeded bounding dimensions.
The classic always used to be German overflows, but these days the wrong sort of Chinese and RTL is more of a headache. (I have been ignoring proper RTL for a while).
I have also noticed LLMs get stuck in healthy/unhealthy loops when doing this sort of work. If you can snapshot the state of them when they are doing the right thing it would be very useful. They also build a lot of good per app context which improves the result quality.
Ahh okay, that makes sense. There are two possible approaches we could take, one would be creating some kind of E2E testing that would detect overflows/things like that, or integrating into existing tests, but that would probably require a lot of work for people. The other thing is, like Burj (Brendan) mentioned, integrating with PostHog session replays and the like and detecting overflows from there. Probably leaning towards the latter just for ease of use.