Introducing the NSA-Proof Font

tazjin | karma 4475 | avg karma 5.0 · 2013-06-22 08:44:08

Not sure if I get the point of this? On digital documents it makes no difference at all and if the NSA (or whoever else) really wanted to digitalise printed documents written in this font they could surely just make some minor modifications to their OCR to support this.

Not to mention that the kind of OCR software available to secret services is probably much better than what is on the consumer market.

reply

tbatterii | karma 289 | avg karma 1.5 · 2013-06-22 08:51:49

I was wondering if the font was more an artistic statement rather than an actual solution.

ctidd | karma 159 | avg karma 4.08 · 2013-06-22 08:54:42

From the article:

Sang has no illusions that even a clever cryptographic font—which you can use in email messages to shield them from snoops and font-recognition bots—will remain encoded for long. They're not meant to be long-term tools with which to combat the NSA. Rather, he views them as an awareness-raising measure.

"This project will not fully solve the problems we are facing now," he writes, " but hopefully will raise some peculiar questions."

reply

ginsong | karma 7 | avg karma 1.0 · 2013-06-22 14:04:42+00:00

>They're not meant to be long-term tools with which to combat the NSA.

Even if you could beat the NSA you still have the UK, Canada, AU, NZ, Russia, China etc. Cool project though.

reply

C1D | karma 233 | avg karma 1.43 · 2013-06-22 15:05:58+00:00

Not only will the font not fully solve the problems everyone is facing but it will not solve anything!

mhb | karma 38136 | avg karma 5.79 · 2013-06-22 08:58:10

TFA is helpful in addressing your inquiry:

"This project will not fully solve the problems we are facing now," he writes, " but hopefully will raise some peculiar questions."

reply

est | karma 8357 | avg karma 2.2 · 2013-06-22 08:57:27

> "I decided to create a typeface that would be unreadable by text scanning software

With enough traning, any typeface is OCR-able

reply

iso8859-1 | karma 1477 | avg karma 1.78 · 2013-06-22 14:21:21+00:00

What if the kerning is non-deterministic? Identifying letters is not enough, you have to know their order too. For example, say my kerning flips every second letter. OCR alone can't solve that problem, you need AI.

sramsay | karma 3554 | avg karma 8.17 · 2013-06-22 14:33:28+00:00

I'm sorry, but I've spent about twenty years trying to deal with digital versions of texts written before 1900, and I think that's just bullshit.

This problem is still really, really fucking hard, and still really, really unsolved. If there's an existing OCR program that you can just "train" using, say, clean scans nineteenth-century newspapers, and have it have a success rate anywhere above "completely sucks," then I'd like to see it (Tesseract, certainly, is reduced to tears by such things -- in fact, Tesseract is reduced to tears by the typeface used in the Federal Reporter). I've used every OCR program ever written. None of them come remotely close to doing the job.

I agree that this is theoretically solvable, but it's a little precious to hear people say over and over how "do-able" something is when no one has actually done it.

reply

iso8859-1 | karma 1477 | avg karma 1.78 · 2013-06-22 14:34:43+00:00

Anything humans can do is theoretically solvable, isn't it?

felixr | karma 1523 | avg karma 7.58 · 2013-06-22 09:01:18

This is somewhat like changing your font to Windings in an HTML e-mail instead of using encryption.

nawitus | karma 3654 | avg karma 2.15 · 2013-06-22 09:03:08

Actually, the rot 13 font ( http://code.eligrey.com/fonts/rot13/example.html) is a lot better for NSA-proofing :).

meowface | karma 10977 | avg karma 2.45 · 2013-06-22 14:39:45+00:00

As silly as it sounds, rot13 probably is a better option, if you're just trying to avoid automatic detections triggered by certain words you say. Obviously it will be of no help if an analyst is directly reading things you've written though.

mistercow | karma 10714 | avg karma 3.06 · 2013-06-22 14:53:12+00:00

I find it extremely unlikely that the NSA's software doesn't automatically try rot13.

nawitus | karma 3654 | avg karma 2.15 · 2013-06-22 10:12:24

Really? That's a weird probability assessment. Rot13 is pretty much used as a laugh.

mistercow | karma 10714 | avg karma 3.06 · 2013-06-22 18:45:30+00:00

By tech savvy people, yes. To many people who are not well-versed in cryptography, substitution ciphers and the like are both the only obvious solution, and seemingly difficult to break. This will include a fair number of terrorists (see gwern's comment above), and so is a worthwhile avenue for a security agency to pursue.

nawitus | karma 3654 | avg karma 2.15 · 2013-06-22 19:16:16+00:00

Yeah, gwern's comment pretty much changed my mind, I wasn't aware some terrorists actually still use Caesar chipers.

gwern | karma 33755 | avg karma 4.24 · 2013-06-22 15:19:11+00:00

Indeed. rot13 is a version of a Caesar cipher, and believe it or not, Caesar ciphers have been used in the recent past by at least 1 would-be terrorist: http://en.wikipedia.org/wiki/Caesar_cipher#History_and_usage

So given that it's a well-known cipher which is easy to break and is still in active use, it would be quite surprising if the NSA's software didn't try.

reply

meowface | karma 10977 | avg karma 2.45 · 2013-06-22 11:04:17

I doubt that in their bulk aggregation, they actually try deciphering content in the initial detection stage.

They probably just have wordlists for different languages (English, Arabic, Chinese, Farsi, French, Spanish). To individually try and rot13 (or any other cipher) every single message they collect as soon as they all come in would be a big waste of processing time. The odds of someone trying to communicate criminal activity via rot13 are absurdly low.

reply

mistercow | karma 10714 | avg karma 3.06 · 2013-06-22 18:15:34+00:00

I'm not sure that's so clear. First of all, you don't have to do it to every message. You first run a very cheap test of the message to see if it appears to consist of normal language. Only if this test fails do you run through a (still very cheap) battery of common and primitive enciphering techniques. Yes, there are lots of emails, but most of them are short, and the kind of processing we're doing here is incredibly fast.

And sure, this wouldn't work against steganography, but anybody who knows about steganography probably also knows how to do proper encryption that the NSA won't be able to break.

reply

danielharan | karma 1535 | avg karma 2.71 · 2013-06-22 14:04:53+00:00

Reminds me of this XKCD: http://xkcd.com/810/

Can we get designers to create fonts that look like hand-writing? The NSA could help us digitize a lot of our history books.

reply

malloc2x | karma 35 | avg karma 2.33 · 2013-06-22 14:07:09+00:00

The biggest problem faced by organizations collecting this type of data is sorting the signal from the noise.

Unfortunately, until real encryption is the norm, using this font or other means to hide your communications is like giving the NSA a free "this bit is particularly juicy, have a human read it" flag.

reply

bmelton | karma 10722 | avg karma 2.57 · 2013-06-22 09:10:52

I'm not sure why nobody has mentioned that generally, font encodings are interpreted by the client, and can be freely disregarded, or replaced.

If the client is a massive NSA parsing engine that scrapes and indexes content, I'm going to guess it's just skipping over the font encodings.

reply

dmix | karma 39707 | avg karma 3.48 · 2013-06-22 14:25:26+00:00

Or just create a really simple OCR substitution cipher to translate them.

lycos1 | karma 2 | avg karma 0.12 · 2013-06-22 09:12:12

Is there any technological basis for this being cryptographic? I couldn't find any, or am I missing the part where it is an artistic statement. Just knowing the language of the message is enough to use frequency analysis to crack the message.

ck2 | karma 28312 | avg karma 4.0 · 2013-06-22 14:18:46+00:00

This is someone being funny right?

Because they can just load the font file into their ocr software and it will recognize it just fine?

reply

lucb1e | karma 17323 | avg karma 2.26 · 2013-06-22 14:27:50+00:00

No need to find whatever font was used; statistical analysis of the message already does the trick.

burntsushi | karma 13683 | avg karma 4.52 · 2013-06-22 14:28:45+00:00

From the article:

    They're not meant to be long-term tools with which to combat the NSA. 
    Rather, he views them as an awareness-raising measure.

jsmcgd | karma 1833 | avg karma 2.76 · 2013-06-22 09:34:51

Perhaps it would be better if these fonts were created on a daily basis, or if people created their own independently. It could significantly add to the processing load of the OCR process.

iso8859-1 | karma 1477 | avg karma 1.78 · 2013-06-22 09:37:45

Daily isn't often enough to defeat statistical analysis:

http://en.wikipedia.org/wiki/Cryptanalysis_of_the_Enigma#Rej...

reply

ck2 | karma 28312 | avg karma 4.0 · 2013-06-22 14:40:12+00:00

Or we could just all start handwriting again, lol

infinitone | karma 337 | avg karma 1.2 · 2013-06-22 10:00:21

Actually, software that identifies handwriting is far more advanced than OCR, mainly because its been a problem for far longer. How do you think the postal office handles all those handwritten letters and postcards?

beagle3 | karma 16421 | avg karma 2.62 · 2013-06-22 17:24:28+00:00

While that's true, it has also been a very focused problem - e.g. Reading digits + capital letters, with known constraints (in the US the capital letters are pairs from a set of 50 states; in the UK, the form is DLD DLD (D=Digit, L=Letter) and the list of legal combinations is far smaller then all.

General handwriting recognition, especially cursive writing, is still a very hard problem.

reply

gus_massa | karma 17133 | avg karma 1.44 · 2013-06-22 14:19:57+00:00

> which you can use in email messages to shield them from snoops and font-recognition bots

Use the source Luke: <font face="ZXX">Something Interesting</font>

It's totally unuseful for email, unless you print it, scan it and send it as an image. And even in that case they can probably train the OCR or flag it for human review.

reply

iso8859-1 | karma 1477 | avg karma 1.78 · 2013-06-22 14:33:29+00:00

What kind of data centers do the NSA use? The best circumvention should take advantage of that. For example, if they have a lot of low-end mainstream machines equipped with 4 GB RAM each, you just embed a JavaScript Scrypt-implementation in your email and encrypt the content with a configuration that would require too long to decrypt with 4 GB RAM, but not with 8 GB RAM.

lucaspiller | karma 4736 | avg karma 2.65 · 2013-06-22 14:38:43+00:00

And if they upgrade?

iso8859-1 | karma 1477 | avg karma 1.78 · 2013-06-22 09:41:59

Of course it will never give you real security if the key is public. But it seems people are too lazy to exchange public keys.

anonymouz | karma 2522 | avg karma 4.64 · 2013-06-22 14:47:49+00:00

You might have some trouble convincing people to actually read your email when they need 8GB RAM and enable JavaScript just to be able to read them.

The real answer is proper encryption, not silly DIY tricks trying to be clever against one specific possible attack.

reply

C1D | karma 233 | avg karma 1.43 · 2013-06-22 15:16:46+00:00

I can't give a URL but there was a link recently on the top of HN that converted emails to an image that can only be read once before its deleted which means as soon as the other end reads the email they can do it again, so if their gmail was 'prism-ed' the NSA wouldnt be able to read it.

kryten | karma 743 | avg karma 2.95 · 2013-06-22 14:34:10+00:00

Regardless of the purpose of this or whether it's an art project or ironic or whatever, it's universally idiotic to say the least.

The message is bad, the idea is bad and it gets a lot of attention.

reply

Eliezer | karma 11213 | avg karma 7.96 · 2013-06-22 09:35:40

My God, the public understanding of cryptography. Bruce Schneier is spinning in his grave and he's not even dead.

furyofantares | karma 5661 | avg karma 3.92 · 2013-06-22 09:43:34

It doesn't sound like the author of the font intends for anyone to actually use this to try to prevent any snooping, but instead it's just meant to be used to convey a political message.

The article however seems to want to play at least a little bit off of the idea that it has some amount of practical value, which is what all these comments are reacting to.

reply

quinndupont | karma 712 | avg karma 1.9 · 2013-06-22 14:50:26+00:00

As others have indicated, this is beyond useless. The NSA is not going to visually inspect your communication, rather, they'll receive the data and simply process it with a machine, that completely ignores the glyphs used (reads the ASCII/Unicode instead). This person has made a deep, fundamental mistake about computer typography, let alone cryptography. Maybe it is all a big joke though.

a3n | karma 11955 | avg karma 2.19 · 2013-06-22 09:57:32

Make an image of the message. Then they at least have to jump the hoop of deciding to OCR.

lignuist | karma 1394 | avg karma 3.74 · 2013-06-22 14:50:37+00:00

I'm counting the hours until a tesseract package appears on github, which is able to OCR this font.

mpyne | karma 7580 | avg karma 1.88 · 2013-06-22 10:00:21

Here I thought that this font was supposed to be TEMPEST-proof but I see the author is still playing in the intramural leagues...

C1D | karma 233 | avg karma 1.43 · 2013-06-22 15:04:33+00:00

The title and the article are both idiotic as this only works if the NSA takes screenshots of messages and deciphers them. If someone uses this font to send a message and the recipient does not have it installed they will see raw text and so will the NSA. I don't believe the creator actually believes that this can do anything but I think the author does not actually know how ASCII or fonts for that matter, work.

olgeni | karma 337 | avg karma 0.92 · 2013-06-22 10:21:25

It might replace Comic Sans on business card, and that would still be progress.