I doubt that in their bulk aggregation, they actually try deciphering content in the initial detection stage.
They probably just have wordlists for different languages (English, Arabic, Chinese, Farsi, French, Spanish). To individually try and rot13 (or any other cipher) every single message they collect as soon as they all come in would be a big waste of processing time. The odds of someone trying to communicate criminal activity via rot13 are absurdly low.
So given that it's a well-known cipher which is easy to break and is still in active use, it would be quite surprising if the NSA's software didn't try.
For this guy yes, but for not necessarily everyone. The messages don't seem to be particularly long so it's unlikely a frequency based attack could work. Also if it is then these guys will not exactly be using the Queen's English so they'll likely have a different letter frequencies for their dialect.
This lad in the article has a real knack for it though, I guess he knows the subject matter pretty well (i.e. knowing the kind of things people will be saying in certain scenarios, the word choice, how they address each other, how the gang hierarchies work etc) and has a good mind for the kind of on-the-fly ciphers that these gangs employ.
It's still a mystery to me. It seems be able to ASCII encode and do Ceasar Ciphers including ROT13 perfectly. I tried to get it to answer with words beginning with alternating even and odd letters. No dice.
Plenty of smart people don't jump straight to a solution when seeing a ROT13 ciphertext, whereas I think most people who are experienced would realize pretty quickly there seems to be a simple transformation going on.
> United for Iran researchers worked with Operator Foundation to confirm that current off-the-shelf scanning tools don't detect the encryption algorithm used to generate the coded words.
This is a really intriguing idea for an app, but I'm not sure how long it can last. If the algorithm for de-obfuscating the messages is known, the regime could just try applying it to every plaintext message they observe, and stopping after the first decoding error, which might be just a couple of words.
A smart obfuscation algorithm would rely on a pre-shared key, such that all possible keys would have to be tried to determine whether a given message was the output of the algorithm. However, all the simple ways of doing that would produce outputs which have statistical/grammatical properties that make the text very different from natural language.
So it wouldn't be too hard to train an AI on, say, a million samples of genuine messages (a corpus built from mass surveillance, or public forums) and another million samples of the output of the algorithm, given the first corpus as input. Once trained, the classifier should be able to detect obfuscated outputs without needing the accompanying secrets.
The next step in this arms race would be for the app's algorithm to use a language model like GPT-3 that generates output with the same statistical properties as natural language, while keeping the hidden meaning and without inflating the size of the output too much. Presumably the algorithm would have to be deterministic in order to be easily reversible (and thus require a "temperature" setting of zero), but it could use a pseudo-random number generator seeded from the shared secret.
Somewhat unsurprising - more interestingly, was it individual words or combinations of phrases that caught the filter? Also amusing that somewhat simple ciphers still easily defeat the Great Firewall (would ROT13 have also worked?)
"The messages are simply a series of letters, relayed in NATO phonetics, and encrypted with a one-time cypher. The receiver would need a code book (by varying reports refreshed daily, weekly, or monthly and inches thick) to identify the key and decrypt the message."
Indeed. Caesar Ciphers seem to be the hardest it gets. Did you notice that this gang cryptanalyst actually described the famed "ROT-13" as a method the gang members used?
Rot13 is not Caesar's cipher. Rot13 is in the family of rotational ciphers, but it's a more modern concept designed around a 26-letter alphabet (so that running it twice produces the plaintext again)
I'm sure it uses some kind of Markov chain statistical analysis technique that would have to be programmed to assume you were typing on a particular keyboard layout in a particular language. On the other hand, there's nothing stopping them from trying to decode the raw data with several different configurations and seeing which one sticks. The IBM Selectric bugs the Soviets planted [1] did a similar thing, where they only transmitted four bits per character, but the messages could be rehydrated knowing letter and word frequencies from the English language.
Caesar ciphers? I believe they are well defined, yes. Although ROT13 itself only dates back to maybe the 70's or 80's.
Besides, I was honestly just using this as an excuse to make a more general point. I've grown frustrated with bugs and security issues arising from "fixing" input to work in a domain which was not designed to handle that input.
But as I said, it doesn't really matter with ROT13 because it isn't a serious algorithm.
I think use of other languages could be described as a form of cryptography. It's been done before as well - see the Navajo code talkers in WWII[1], who were very effective in terms of speed, and who the Japanese found difficult to even transcribe. Of course, this relies on the language being obscure, and many more languages are well documented now than in the 1940s.
>it should likewise be trivial to crack any remaining unknown ancient scripts
That doesn't follow. Part of successfully decrypting a message is knowing when you have the right answer. That doesn't apply if you're looking at a limited dataset (or claytablet set), and you have no idea about the context the texts were written in.
I guess, isn't it weird that nearly matches the graph for English? If each letter moved to the right 8 places closely matched the expected value of that letter that would suggest that that you would be on to something.
Doesn't this point to it might be a fraud as you wouldn't expect an encrypted message to contain the exact same distribution as expected unless they just switched letters of near the same frequency around?
Why? You don’t need to be able to understand a cipher to produce plausible cipher text. You can take headers and footers from real messages and just drop in nonsense between. Or retransmit messages you recorded a few days earlier, or random mixtures of bits of real messages. The enemy still has to transcribe them, sort them out, reject the noise.
What, you mean that performing arbitrary permutations and transpositions on a ciphertext until it produces garbage that vaguely resembles actual words in various disjoint places isn't actually cracking the code?!
They probably just have wordlists for different languages (English, Arabic, Chinese, Farsi, French, Spanish). To individually try and rot13 (or any other cipher) every single message they collect as soon as they all come in would be a big waste of processing time. The odds of someone trying to communicate criminal activity via rot13 are absurdly low.
reply