logoalt Hacker News

philipkglasstoday at 4:53 PM1 replyview on HN

At least one of the test questions was just a screen shot from a tweet. It was difficult to read. I'd suggest extracting text from screen shots with OCR. Apple has built-in functionality for this on their operating systems with Live Text. There are strong open source systems based on small vision language models for this, too. The one I have been recommending lately is GLM-OCR:

https://github.com/zai-org/GLM-OCR

It's fast and can run even on low-resource computers.

---

Does this CAPTCHA actually resist computers? I didn't try feeding the questions I got to an LLM, but my sense is that current frontier models could probably pass all of these too. Making generated text pass the pangram test is simple enough for someone actually writing a bot to spin up automated accounts.


Replies

tripplyonstoday at 5:02 PM

I think it's more about resisting some humans than it is about resisting machines.