Hacker Read

meowface · 2013-06-22 11:04:17

I doubt that in their bulk aggregation, they actually try deciphering content in the initial detection stage.

They probably just have wordlists for different languages (English, Arabic, Chinese, Farsi, French, Spanish). To individually try and rot13 (or any other cipher) every single message they collect as soon as they all come in would be a big waste of processing time. The odds of someone trying to communicate criminal activity via rot13 are absurdly low.

reply

gwern | karma 33755 | avg karma 4.24 · | 2013-06-22 15:19:11+00:00

Indeed. rot13 is a version of a Caesar cipher, and believe it or not, Caesar ciphers have been used in the recent past by at least 1 would-be terrorist: http://en.wikipedia.org/wiki/Caesar_cipher#History_and_usage

So given that it's a well-known cipher which is easy to break and is still in active use, it would be quite surprising if the NSA's software didn't try.

reply

JonnieCache | karma 17245 | avg karma 4.71 · | 2011-05-03 13:25:10+00:00

What exactly do you mean here? ROT13? Some sort of crazy arabic version of pig latin?

Just because they have DARPA doesn't mean the americans can just waltz past elliptic-curve cyphers.

EDIT: Unless you mean the 'give me the key or I'll fetch the powerdrill' kind of encryption breaking, in which case, fair point.

reply

smcl | karma 13675 | avg karma 2.97 · | 2013-10-29 09:31:05+00:00

For this guy yes, but for not necessarily everyone. The messages don't seem to be particularly long so it's unlikely a frequency based attack could work. Also if it is then these guys will not exactly be using the Queen's English so they'll likely have a different letter frequencies for their dialect.

This lad in the article has a real knack for it though, I guess he knows the subject matter pretty well (i.e. knowing the kind of things people will be saying in certain scenarios, the word choice, how they address each other, how the gang hierarchies work etc) and has a good mind for the kind of on-the-fly ciphers that these gangs employ.

reply

weinzierl | karma 23117 | avg karma 5.55 · | 2023-06-08 11:59:53

It's still a mystery to me. It seems be able to ASCII encode and do Ceasar Ciphers including ROT13 perfectly. I tried to get it to answer with words beginning with alternating even and odd letters. No dice.

nl | karma 29762 | avg karma 2.49 · | 2021-03-24 23:46:13

Were they experienced at breaking ciphers?

Plenty of smart people don't jump straight to a solution when seeing a ROT13 ciphertext, whereas I think most people who are experienced would realize pretty quickly there seems to be a simple transformation going on.

reply

dane-pgp | karma 8830 | avg karma 2.31 · | 2021-09-19 22:04:46

> United for Iran researchers worked with Operator Foundation to confirm that current off-the-shelf scanning tools don't detect the encryption algorithm used to generate the coded words.

This is a really intriguing idea for an app, but I'm not sure how long it can last. If the algorithm for de-obfuscating the messages is known, the regime could just try applying it to every plaintext message they observe, and stopping after the first decoding error, which might be just a couple of words.

A smart obfuscation algorithm would rely on a pre-shared key, such that all possible keys would have to be tried to determine whether a given message was the output of the algorithm. However, all the simple ways of doing that would produce outputs which have statistical/grammatical properties that make the text very different from natural language.

So it wouldn't be too hard to train an AI on, say, a million samples of genuine messages (a corpus built from mass surveillance, or public forums) and another million samples of the output of the algorithm, given the first corpus as input. Once trained, the classifier should be able to detect obfuscated outputs without needing the accompanying secrets.

The next step in this arms race would be for the app's algorithm to use a language model like GPT-3 that generates output with the same statistical properties as natural language, while keeping the hidden meaning and without inflating the size of the output too much. Presumably the algorithm would have to be deterministic in order to be easily reversible (and thus require a "temperature" setting of zero), but it could use a pseudo-random number generator seeded from the shared secret.

reply

SOLAR_FIELDS | karma 5635 | avg karma 2.65 · | 2022-04-04 21:28:21

Somewhat unsurprising - more interestingly, was it individual words or combinations of phrases that caught the filter? Also amusing that somewhat simple ciphers still easily defeat the Great Firewall (would ROT13 have also worked?)

acapybara | karma 488 | avg karma 3.61 · | 2022-01-11 14:59:16

"The messages are simply a series of letters, relayed in NATO phonetics, and encrypted with a one-time cypher. The receiver would need a code book (by varying reports refreshed daily, weekly, or monthly and inches thick) to identify the key and decrypt the message."

ves | karma 503 | avg karma 2.62 · | 2016-06-22 05:18:21+00:00

There's no way in hell they're doing ML on cipher text. It's like orders upon orders of magnitudes too slow.

bediger4000 | karma 4590 | avg karma 0.96 · | 2013-10-29 13:53:42+00:00

Indeed. Caesar Ciphers seem to be the hardest it gets. Did you notice that this gang cryptanalyst actually described the famed "ROT-13" as a method the gang members used?

menacingly | karma 2115 | avg karma 4.44 · | 2019-06-09 13:06:03+00:00

Rot13 is not Caesar's cipher. Rot13 is in the family of rotational ciphers, but it's a more modern concept designed around a 26-letter alphabet (so that running it twice produces the plaintext again)

13of40 | karma 5916 | avg karma 3.27 · | 2016-08-25 00:40:36+00:00

I'm sure it uses some kind of Markov chain statistical analysis technique that would have to be programmed to assume you were typing on a particular keyboard layout in a particular language. On the other hand, there's nothing stopping them from trying to decode the raw data with several different configurations and seeing which one sticks. The IBM Selectric bugs the Soviets planted [1] did a similar thing, where they only transmitted four bits per character, but the messages could be rehydrated knowing letter and word frequencies from the English language.

[1] http://www.cryptomuseum.com/covert/bugs/selectric/

reply

jmvoodoo | karma 907 | avg karma 4.47 · | 2021-08-30 12:37:48

I imagine if you knew those were codewords likely to appear in a cypher it could certainly help with cryptanalysis.

McGlockenshire | karma 885 | avg karma 3.14 · | 2015-09-15 22:19:50

If you substitute "all ciphers" with "all known ciphers" and don a tin-foil hat, it works.

ChrisSD | karma 7042 | avg karma 4.64 · | 2019-06-09 12:31:19+00:00

Caesar ciphers? I believe they are well defined, yes. Although ROT13 itself only dates back to maybe the 70's or 80's.

Besides, I was honestly just using this as an excuse to make a more general point. I've grown frustrated with bugs and security issues arising from "fixing" input to work in a domain which was not designed to handle that input.

But as I said, it doesn't really matter with ROT13 because it isn't a serious algorithm.

reply

widdershins | karma 808 | avg karma 3.96 · | 2018-02-04 13:08:02

I think use of other languages could be described as a form of cryptography. It's been done before as well - see the Navajo code talkers in WWII[1], who were very effective in terms of speed, and who the Japanese found difficult to even transcribe. Of course, this relies on the language being obscure, and many more languages are well documented now than in the 1940s.

>it should likewise be trivial to crack any remaining unknown ancient scripts

That doesn't follow. Part of successfully decrypting a message is knowing when you have the right answer. That doesn't apply if you're looking at a limited dataset (or claytablet set), and you have no idea about the context the texts were written in.

[1] https://en.wikipedia.org/wiki/Code_talker

reply

noahc | karma 1441 | avg karma 2.24 · | 2011-03-29 22:28:20+00:00

I guess, isn't it weird that nearly matches the graph for English? If each letter moved to the right 8 places closely matched the expected value of that letter that would suggest that that you would be on to something.

Doesn't this point to it might be a fraud as you wouldn't expect an encrypted message to contain the exact same distribution as expected unless they just switched letters of near the same frequency around?

reply

jameshart | karma 19910 | avg karma 4.65 · | 2023-06-26 21:29:21

Why? You don’t need to be able to understand a cipher to produce plausible cipher text. You can take headers and footers from real messages and just drop in nonsense between. Or retransmit messages you recorded a few days earlier, or random mixtures of bits of real messages. The enemy still has to transcribe them, sort them out, reject the noise.

StavrosK | karma 1 | avg karma 0.0 · | 2013-04-14 00:13:27+00:00

What, you mean that performing arbitrary permutations and transpositions on a ciphertext until it produces garbage that vaguely resembles actual words in various disjoint places isn't actually cracking the code?!