Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Introducing the NSA-Proof Font (motherboard.vice.com) similar stories update story
31 points by Libertatea | karma 36161 | avg karma 15.51 2013-06-22 08:37:02 | hide | past | favorite | 49 comments



view as:

Not sure if I get the point of this? On digital documents it makes no difference at all and if the NSA (or whoever else) really wanted to digitalise printed documents written in this font they could surely just make some minor modifications to their OCR to support this.

Not to mention that the kind of OCR software available to secret services is probably much better than what is on the consumer market.


I was wondering if the font was more an artistic statement rather than an actual solution.

From the article:

Sang has no illusions that even a clever cryptographic font—which you can use in email messages to shield them from snoops and font-recognition bots—will remain encoded for long. They're not meant to be long-term tools with which to combat the NSA. Rather, he views them as an awareness-raising measure.

"This project will not fully solve the problems we are facing now," he writes, " but hopefully will raise some peculiar questions."


>They're not meant to be long-term tools with which to combat the NSA.

Even if you could beat the NSA you still have the UK, Canada, AU, NZ, Russia, China etc. Cool project though.


Not only will the font not fully solve the problems everyone is facing but it will not solve anything!

TFA is helpful in addressing your inquiry:

"This project will not fully solve the problems we are facing now," he writes, " but hopefully will raise some peculiar questions."


> "I decided to create a typeface that would be unreadable by text scanning software

With enough traning, any typeface is OCR-able


What if the kerning is non-deterministic? Identifying letters is not enough, you have to know their order too. For example, say my kerning flips every second letter. OCR alone can't solve that problem, you need AI.

I'm sorry, but I've spent about twenty years trying to deal with digital versions of texts written before 1900, and I think that's just bullshit.

This problem is still really, really fucking hard, and still really, really unsolved. If there's an existing OCR program that you can just "train" using, say, clean scans nineteenth-century newspapers, and have it have a success rate anywhere above "completely sucks," then I'd like to see it (Tesseract, certainly, is reduced to tears by such things -- in fact, Tesseract is reduced to tears by the typeface used in the Federal Reporter). I've used every OCR program ever written. None of them come remotely close to doing the job.

I agree that this is theoretically solvable, but it's a little precious to hear people say over and over how "do-able" something is when no one has actually done it.


Anything humans can do is theoretically solvable, isn't it?

This is somewhat like changing your font to Windings in an HTML e-mail instead of using encryption.

Actually, the rot 13 font ( http://code.eligrey.com/fonts/rot13/example.html) is a lot better for NSA-proofing :).

As silly as it sounds, rot13 probably is a better option, if you're just trying to avoid automatic detections triggered by certain words you say. Obviously it will be of no help if an analyst is directly reading things you've written though.

I find it extremely unlikely that the NSA's software doesn't automatically try rot13.

Really? That's a weird probability assessment. Rot13 is pretty much used as a laugh.

By tech savvy people, yes. To many people who are not well-versed in cryptography, substitution ciphers and the like are both the only obvious solution, and seemingly difficult to break. This will include a fair number of terrorists (see gwern's comment above), and so is a worthwhile avenue for a security agency to pursue.

Yeah, gwern's comment pretty much changed my mind, I wasn't aware some terrorists actually still use Caesar chipers.

Indeed. rot13 is a version of a Caesar cipher, and believe it or not, Caesar ciphers have been used in the recent past by at least 1 would-be terrorist: http://en.wikipedia.org/wiki/Caesar_cipher#History_and_usage

So given that it's a well-known cipher which is easy to break and is still in active use, it would be quite surprising if the NSA's software didn't try.


I doubt that in their bulk aggregation, they actually try deciphering content in the initial detection stage.

They probably just have wordlists for different languages (English, Arabic, Chinese, Farsi, French, Spanish). To individually try and rot13 (or any other cipher) every single message they collect as soon as they all come in would be a big waste of processing time. The odds of someone trying to communicate criminal activity via rot13 are absurdly low.


I'm not sure that's so clear. First of all, you don't have to do it to every message. You first run a very cheap test of the message to see if it appears to consist of normal language. Only if this test fails do you run through a (still very cheap) battery of common and primitive enciphering techniques. Yes, there are lots of emails, but most of them are short, and the kind of processing we're doing here is incredibly fast.

And sure, this wouldn't work against steganography, but anybody who knows about steganography probably also knows how to do proper encryption that the NSA won't be able to break.


Reminds me of this XKCD: http://xkcd.com/810/

Can we get designers to create fonts that look like hand-writing? The NSA could help us digitize a lot of our history books.


The biggest problem faced by organizations collecting this type of data is sorting the signal from the noise.

Unfortunately, until real encryption is the norm, using this font or other means to hide your communications is like giving the NSA a free "this bit is particularly juicy, have a human read it" flag.


I'm not sure why nobody has mentioned that generally, font encodings are interpreted by the client, and can be freely disregarded, or replaced.

If the client is a massive NSA parsing engine that scrapes and indexes content, I'm going to guess it's just skipping over the font encodings.


Or just create a really simple OCR substitution cipher to translate them.

Is there any technological basis for this being cryptographic? I couldn't find any, or am I missing the part where it is an artistic statement. Just knowing the language of the message is enough to use frequency analysis to crack the message.

This is someone being funny right?

Because they can just load the font file into their ocr software and it will recognize it just fine?


No need to find whatever font was used; statistical analysis of the message already does the trick.

From the article:

    They're not meant to be long-term tools with which to combat the NSA. 
    Rather, he views them as an awareness-raising measure.

Perhaps it would be better if these fonts were created on a daily basis, or if people created their own independently. It could significantly add to the processing load of the OCR process.

Daily isn't often enough to defeat statistical analysis:

http://en.wikipedia.org/wiki/Cryptanalysis_of_the_Enigma#Rej...


Or we could just all start handwriting again, lol

Actually, software that identifies handwriting is far more advanced than OCR, mainly because its been a problem for far longer. How do you think the postal office handles all those handwritten letters and postcards?

While that's true, it has also been a very focused problem - e.g. Reading digits + capital letters, with known constraints (in the US the capital letters are pairs from a set of 50 states; in the UK, the form is DLD DLD (D=Digit, L=Letter) and the list of legal combinations is far smaller then all.

General handwriting recognition, especially cursive writing, is still a very hard problem.


> which you can use in email messages to shield them from snoops and font-recognition bots

Use the source Luke: <font face="ZXX">Something Interesting</font>

It's totally unuseful for email, unless you print it, scan it and send it as an image. And even in that case they can probably train the OCR or flag it for human review.


What kind of data centers do the NSA use? The best circumvention should take advantage of that. For example, if they have a lot of low-end mainstream machines equipped with 4 GB RAM each, you just embed a JavaScript Scrypt-implementation in your email and encrypt the content with a configuration that would require too long to decrypt with 4 GB RAM, but not with 8 GB RAM.

And if they upgrade?

Of course it will never give you real security if the key is public. But it seems people are too lazy to exchange public keys.

You might have some trouble convincing people to actually read your email when they need 8GB RAM and enable JavaScript just to be able to read them.

The real answer is proper encryption, not silly DIY tricks trying to be clever against one specific possible attack.


I can't give a URL but there was a link recently on the top of HN that converted emails to an image that can only be read once before its deleted which means as soon as the other end reads the email they can do it again, so if their gmail was 'prism-ed' the NSA wouldnt be able to read it.

Regardless of the purpose of this or whether it's an art project or ironic or whatever, it's universally idiotic to say the least.

The message is bad, the idea is bad and it gets a lot of attention.


My God, the public understanding of cryptography. Bruce Schneier is spinning in his grave and he's not even dead.

It doesn't sound like the author of the font intends for anyone to actually use this to try to prevent any snooping, but instead it's just meant to be used to convey a political message.

The article however seems to want to play at least a little bit off of the idea that it has some amount of practical value, which is what all these comments are reacting to.


As others have indicated, this is beyond useless. The NSA is not going to visually inspect your communication, rather, they'll receive the data and simply process it with a machine, that completely ignores the glyphs used (reads the ASCII/Unicode instead). This person has made a deep, fundamental mistake about computer typography, let alone cryptography. Maybe it is all a big joke though.

Make an image of the message. Then they at least have to jump the hoop of deciding to OCR.

I'm counting the hours until a tesseract package appears on github, which is able to OCR this font.

Here I thought that this font was supposed to be TEMPEST-proof but I see the author is still playing in the intramural leagues...

The title and the article are both idiotic as this only works if the NSA takes screenshots of messages and deciphers them. If someone uses this font to send a message and the recipient does not have it installed they will see raw text and so will the NSA. I don't believe the creator actually believes that this can do anything but I think the author does not actually know how ASCII or fonts for that matter, work.

It might replace Comic Sans on business card, and that would still be progress.

Legal | privacy