> not exactly OCR, but similar. So chunks of the image that look sufficiently similar get replace...

thaumasiotes • yesterday at 10:08 PM • 3 replies • view on HN

> not exactly OCR, but similar. So chunks of the image that look sufficiently similar get replaced with a reference to a single instance.

How can we describe OCR that wouldn't match this definition exactly?

Replies

Terr_ • today at 1:20 AM

It's not too hard, while they share some mechanics, the underlying use-cases and requirements are very different.

_______ Optical character recognition:

1. You have a set of predefined patterns of interest which are well-known.

2. You're trying your best to find all occurrences of those patterns. If a letter appears only once, you still need to detect it.

3. You don't care much about visual similarity within a category. The letter "B" written in extremely different fonts is the same letter.

4. You care strongly about the boundaries between categories. For example, "B+" must resolve to two known characters in sequence.

5. You want to keep details of exactly where something was found, or at the least in what order they were found. You're creating a layer of new details, which may be added to the artifact.

_______ "Glyph compression":

1. You don't have a predefined set of patterns, the algorithm is probably trying to dynamically guess at patterns which are sufficiently similar and frequent.

2. Your aren't trying to find all occurrences, only sufficiently similar and common ones, to maximize compression. If a letter appears only once, it can be ignored.

3. You do care strongly about visual similarity within a category, you don't want to mix-n-match fonts.

4. You don't care about clear category lines, if "B+" becomes its own glyph, that's no problem.

5. You're discarding detail from the artifact, to make it smaller.

yuliyp • yesterday at 11:32 PM

Glyph binning looks for any chunks in the image that are similar to eachother, regardless of what they are. Letters, eyeballs, pennies, triangles, etc without caring what it is. OCR looks specifically to try and identify characters (i.e. it starts with a knowledge of an alphabet, then looks for things in the image that look like those.

If the image is actually text, both of them can end up finding things. Binning will identify "these things look almost the same", while OCR will identify "these look like the letter M"

Dylan16807 • yesterday at 10:36 PM

Jbig2 dynamically pulls reference chunks out of the image, which makes it more likely to have insufficient separation between the target shapes.

It also gives a false sense of security when it displays dirty pixels that still clearly show a specific digit, since you think you're basically looking at the original.

➕ show 1 reply

alt Hacker News

Replies