Hacker Read

jcrawfordor · 2020-05-31 21:08:31+00:00

Very minor but interesting nitpick: the font used on checks is not OCR (optical) but MICR (magnetic ink). The design objectives are different and different font families exist for the two purposes. MICR as used on checks (more properly called E-13B) bears unusual, distinctive character shapes emphasizing abnormally wide horizontal components due to the need for each character to have a distinctive waveform when read as density from left to right, essentially by a tape recorder read head. Fonts optimized for OCR are usually more normal looking to humans because they emphasize clear detection of lines instead.

E-13B is a bit of an ideal use case for this method because of the highly constrained character set used on checks and the unusually nonuniform density of E-13B. The same thing can be done on text more generally but gets significantly more difficult.

reply

userbinator | karma 78987 | avg karma 4.37 · | 2015-06-22 05:41:18

I would be surprised if they didn't use OCR with possibly human checking.

enraged_camel | karma 16714 | avg karma 2.78 · | 2012-10-22 19:47:19+00:00

Minor nitpick: for machine-reading hand-written text, the process is actually called ICR - Intelligent Character Recognition.

injb | karma 881 | avg karma 1.7 · | 2021-02-05 15:27:40+00:00

I don't think it's OCR. The imperfections in each letter are identical in every single instance. It's very clear if you zoom in

hutzlibu | karma 9787 | avg karma 1.64 · | 2021-03-10 12:33:05+00:00

But I doubt anyone would take the hazzle of doing OCR. Because it is still a lot of work to spot misstakes (which still do happen, also with standardfonts).

Also, who would want to lower the perceived visual quality of his resume? And scanning a document, means just that. It will be still readable, yes, but you can see, that it was scanned in.

reply

gnode | karma 2428 | avg karma 2.09 · | 2019-11-01 15:04:45+00:00

This seems to be the worst of both worlds. It's not easy for a human to read a square (compared to a line of text). The pixellated font is also not easily readable compared to a vector font. It's also not easy for machines to read an optical coding with no spatially distributed redundancy.

QR codes and bar codes are brilliant for machines because misreads due to some spurious reflection or spec of dust is mitigated by error correction.

I feel like this problem is already well served by bar codes which have a human readable text representation below them (e.g. serial number stickers).

That said, I can see the security advantage of the computer reading the same representation as a human, although this is probably not the best place to enforce security. As there's no integrity check, there's little guarantee the computer will read what you see though. Maybe linear OCR combined with a barcode checksum would be a better way to achieve these goals.

reply

zonidjan | karma 158 | avg karma 1.6 · | 2019-07-10 10:23:11

Yes, and that's why it used to be text from books, where they wanted to confirm their OCR.

pjc50 | karma 93685 | avg karma 3.72 · | 2018-10-12 10:48:58+00:00

OCR is less reliable than looking at the character data directly, if it's available.

abdullahkhalids | karma 7684 | avg karma 3.15 · | 2023-11-15 12:09:24

These are on digital documents, where the characters do look different. Handwritten characters look similar. It's a much more difficult task.

tazjin | karma 4475 | avg karma 5.0 · | 2013-06-22 08:44:08

Not sure if I get the point of this? On digital documents it makes no difference at all and if the NSA (or whoever else) really wanted to digitalise printed documents written in this font they could surely just make some minor modifications to their OCR to support this.

Not to mention that the kind of OCR software available to secret services is probably much better than what is on the consumer market.

reply

eesmith | karma 7445 | avg karma 1.06 · | 2020-01-28 05:09:38+00:00

Why are you so insistent that OCR has anything to do with this news event?

What litigable event are you even considering?

Checks use MICR, not OCR - https://en.wikipedia.org/wiki/Magnetic_ink_character_recogni... .

Elsewhere you tried to say it was the "organic OCR embedded in the skulls of [...] employees", so now you think people's brains will go kablooie?

reply

eesmith | karma 7445 | avg karma 1.06 · | 2020-01-27 20:56:09+00:00

There's nothing about OCR in the article.

The closest is: "He said the checks Thomas presented displayed a watermark that read VOID when they were scanned in a web viewer."

This is a feature of some security paper - https://en.wikipedia.org/wiki/Void_pantograph says "In security printing, void pantograph refers to a method of making copy-evident and tamper-resistant patterns in the background of a document. Normally these are invisible to the eye, but become obvious when the document is photocopied. Typically they spell out "void", "copy", "invalid" or some other indicator message"

What it means is their system wasn't at high enough resolution. Which should be expected. Which means they are either extremely poorly trained, or they are looking for an excuse.

Note the following from the freep article:

> According to TCF's Wennerberg ... Thomas wanted to deposit the two larger checks in his bank account, which, Wennerberg said, had only 52 cents in it. And he wanted to cash the $13,000 check,

Does your bank generally tell people how much is in your account, and what your bank transactions are?

reply

rendaw | karma 2272 | avg karma 3.41 · | 2021-08-11 21:12:57

How accurate is this? I tried ocr'ing my receipts a while back and the issues with misrecognized numbers were too large to be worth it. Unlike text where a few typos are unlikely to largely change the meaning, a single digit wrong on a receipt can be a big issue and receipt numbers aren't super high fidelity.

This looks like something I'd definitely use (and pay for, at that price).

reply

stavros | karma 66636 | avg karma 10.05 · | 2022-10-31 16:06:28

OCR doesn't help when you have to find the wrong character in two pages of random stuff. QR codes are your friend.

fragmede | karma 18795 | avg karma 1.82 · | 2020-05-31 18:59:52

Here's a very specific example, having to do with a much smaller data-set, the OCR font used for the routing and account number on cheques.

https://lifehacker.com/how-to-uncover-blurred-information-in...

reply

phren0logy | karma 5101 | avg karma 5.57 · | 2023-04-19 17:33:03

I have been wondering the same thing. So many OCR engines spit out results that are obviously wrong, and I don't want them to get too clever but a little but of smarts would go a long way.

judge2020 | karma 18790 | avg karma 2.87 · | 2024-05-13 13:48:48

Perhaps, but I think OCR is more likely.

Fredej | karma 258 | avg karma 2.41 · | 2021-04-19 02:30:59

I guess that could be a form of lossy text compression - where the end result is not completely right (the letters not being completely in the right order) but it's good enough to be able to read the text.

Probiotic6081 | karma 50 | avg karma 0.62 · | 2023-11-13 05:53:07

That's a good practical solution but ideally they should just let you copy the text instead of runnign some possibly error prone OCR on it.

j2kun | karma 6207 | avg karma 3.38 · | 2014-05-08 08:54:45

So the use case is primarily to produce something machine readable rather than human readable (it's not that unreadable, but still). I can see that. Is there a human-readable flag?