Not sure if I get the point of this? On digital documents it makes no difference at all and if the NSA (or whoever else) really wanted to digitalise printed documents written in this font they could surely just make some minor modifications to their OCR to support this.
Not to mention that the kind of OCR software available to secret services is probably much better than what is on the consumer market.
Sang has no illusions that even a clever cryptographic font—which you can use in email messages to shield them from snoops and font-recognition bots—will remain encoded for long. They're not meant to be long-term tools with which to combat the NSA. Rather, he views them as an awareness-raising measure.
"This project will not fully solve the problems we are facing now," he writes, " but hopefully will raise some peculiar questions."
What if the kerning is non-deterministic? Identifying letters is not enough, you have to know their order too. For example, say my kerning flips every second letter. OCR alone can't solve that problem, you need AI.
I'm sorry, but I've spent about twenty years trying to deal with digital versions of texts written before 1900, and I think that's just bullshit.
This problem is still really, really fucking hard, and still really, really unsolved. If there's an existing OCR program that you can just "train" using, say, clean scans nineteenth-century newspapers, and have it have a success rate anywhere above "completely sucks," then I'd like to see it (Tesseract, certainly, is reduced to tears by such things -- in fact, Tesseract is reduced to tears by the typeface used in the Federal Reporter). I've used every OCR program ever written. None of them come remotely close to doing the job.
I agree that this is theoretically solvable, but it's a little precious to hear people say over and over how "do-able" something is when no one has actually done it.
As silly as it sounds, rot13 probably is a better option, if you're just trying to avoid automatic detections triggered by certain words you say. Obviously it will be of no help if an analyst is directly reading things you've written though.
By tech savvy people, yes. To many people who are not well-versed in cryptography, substitution ciphers and the like are both the only obvious solution, and seemingly difficult to break. This will include a fair number of terrorists (see gwern's comment above), and so is a worthwhile avenue for a security agency to pursue.
So given that it's a well-known cipher which is easy to break and is still in active use, it would be quite surprising if the NSA's software didn't try.
I doubt that in their bulk aggregation, they actually try deciphering content in the initial detection stage.
They probably just have wordlists for different languages (English, Arabic, Chinese, Farsi, French, Spanish). To individually try and rot13 (or any other cipher) every single message they collect as soon as they all come in would be a big waste of processing time. The odds of someone trying to communicate criminal activity via rot13 are absurdly low.
I'm not sure that's so clear. First of all, you don't have to do it to every message. You first run a very cheap test of the message to see if it appears to consist of normal language. Only if this test fails do you run through a (still very cheap) battery of common and primitive enciphering techniques. Yes, there are lots of emails, but most of them are short, and the kind of processing we're doing here is incredibly fast.
And sure, this wouldn't work against steganography, but anybody who knows about steganography probably also knows how to do proper encryption that the NSA won't be able to break.
The biggest problem faced by organizations collecting this type of data is sorting the signal from the noise.
Unfortunately, until real encryption is the norm, using this font or other means to hide your communications is like giving the NSA a free "this bit is particularly juicy, have a human read it" flag.
Is there any technological basis for this being cryptographic? I couldn't find any, or am I missing the part where it is an artistic statement. Just knowing the language of the message is enough to use frequency analysis to crack the message.
Perhaps it would be better if these fonts were created on a daily basis, or if people created their own independently. It could significantly add to the processing load of the OCR process.
Actually, software that identifies handwriting is far more advanced than OCR, mainly because its been a problem for far longer. How do you think the postal office handles all those handwritten letters and postcards?
While that's true, it has also been a very focused problem - e.g. Reading digits + capital letters, with known constraints (in the US the capital letters are pairs from a set of 50 states; in the UK, the form is DLD DLD (D=Digit, L=Letter) and the list of legal combinations is far smaller then all.
General handwriting recognition, especially cursive writing, is still a very hard problem.
> which you can use in email messages to shield them from snoops and font-recognition bots
Use the source Luke: <font face="ZXX">Something Interesting</font>
It's totally unuseful for email, unless you print it, scan it and send it as an image. And even in that case they can probably train the OCR or flag it for human review.
What kind of data centers do the NSA use? The best circumvention should take advantage of that. For example, if they have a lot of low-end mainstream machines equipped with 4 GB RAM each, you just embed a JavaScript Scrypt-implementation in your email and encrypt the content with a configuration that would require too long to decrypt with 4 GB RAM, but not with 8 GB RAM.
I can't give a URL but there was a link recently on the top of HN that converted emails to an image that can only be read once before its deleted which means as soon as the other end reads the email they can do it again, so if their gmail was 'prism-ed' the NSA wouldnt be able to read it.
It doesn't sound like the author of the font intends for anyone to actually use this to try to prevent any snooping, but instead it's just meant to be used to convey a political message.
The article however seems to want to play at least a little bit off of the idea that it has some amount of practical value, which is what all these comments are reacting to.
As others have indicated, this is beyond useless. The NSA is not going to visually inspect your communication, rather, they'll receive the data and simply process it with a machine, that completely ignores the glyphs used (reads the ASCII/Unicode instead). This person has made a deep, fundamental mistake about computer typography, let alone cryptography. Maybe it is all a big joke though.
The title and the article are both idiotic as this only works if the NSA takes screenshots of messages and deciphers them. If someone uses this font to send a message and the recipient does not have it installed they will see raw text and so will the NSA.
I don't believe the creator actually believes that this can do anything but I think the author does not actually know how ASCII or fonts for that matter, work.
Not to mention that the kind of OCR software available to secret services is probably much better than what is on the consumer market.
reply