Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

For this guy yes, but for not necessarily everyone. The messages don't seem to be particularly long so it's unlikely a frequency based attack could work. Also if it is then these guys will not exactly be using the Queen's English so they'll likely have a different letter frequencies for their dialect.

This lad in the article has a real knack for it though, I guess he knows the subject matter pretty well (i.e. knowing the kind of things people will be saying in certain scenarios, the word choice, how they address each other, how the gang hierarchies work etc) and has a good mind for the kind of on-the-fly ciphers that these gangs employ.



sort by: page size:

I wonder where the borders lie between encryption and deliberate obfuscation - say, by native speakers of a little-known language (which was done for some WWII radio transmissions by the US [1]) and also to a far lesser extent by old-time London gangs using Cockney rhyming slang and variants thereof. [2]

[1] https://www.nationalww2museum.org/war/articles/american-indi...

[2] https://romanroadlondon.com/cockney-rhyming-slang-history/


I doubt that in their bulk aggregation, they actually try deciphering content in the initial detection stage.

They probably just have wordlists for different languages (English, Arabic, Chinese, Farsi, French, Spanish). To individually try and rot13 (or any other cipher) every single message they collect as soon as they all come in would be a big waste of processing time. The odds of someone trying to communicate criminal activity via rot13 are absurdly low.


I think so. These types of ciphers are definitely vulnerable to analysis across messages especially if you can make a guess about something common to both messages (eg - common ending like “sincerely”).

Indeed. Caesar Ciphers seem to be the hardest it gets. Did you notice that this gang cryptanalyst actually described the famed "ROT-13" as a method the gang members used?

ciphers are easily broken with frequency analysis. English always has a ton of E’s

encoding can be much faster than decoding, going more complicated to beat frequency analysis is just going to slow your own decoding/reading down

Yes it does thwart people that should have already been bounced by the cultural expectation of not reading your journal, but it doesn’t take a state actor to break it, just someone slightly above curious


yeah. gp (and I recommend everybody :) ) should probably try cracking a few substitution ciphers by hand just to see incredibly well frequency analysis really works.

like you say, a few dozen characters of ciphertext. and you don't even need the full frequency chart, either. just the top few is sufficient (there's a lot of noise as you go down anyway because of Zipf's law). add knowledge of the top-3 bigrams and trigrams, and you're all set.

it's all due to the fact that Shannon discovered, that there's only roughly 1.5 bit of entropy per character in English text. and English is in fact one of the more efficient languages when it comes to this, Dutch or German have even less entropy per character :) I don't know how it compares to Spanish, however. gendered words are basically a type of redundancy checks, so they waste bits in that sense.


Not really, at least, not this time and not by this person. If it was a set of anagrams, frequency analysis would have matched medieval Italian. See:

http://www.ciphermysteries.com/2009/02/17/edith-sherwoods-an...


This is what the successful decrypt of another Enigma message looks like:

UUUVIRSIBENNULEINSYNACHRXUUUSTUETZPUNKTLUEBECKVVVCHEFVIERXUUUFLOTTXXMITUUUVIERSIBENNULZWOUNDUUUVIERSIBENNULDREIZURFLENDERWERFTLUEBECKGEHENXFONDORTFOLGTWEITERESX

There's enough code language being used in these messages that frequency analysis is not going to a major help (especially since it doesn't work well on very short texts anyways).


I imagine if you knew those were codewords likely to appear in a cypher it could certainly help with cryptanalysis.

If you want to be a cipher instead of a human being, maybe.

The article talks about hidden markov modelling, which combined with initial training data, can theoretically be applied to your cipher to solve for a solution. The number of things that humans can observe and create sensible (memorable) code from is not infinite. It must be based on some aspect of the environment from which information is derived. You get the abstract form (the grammar) and train to infer the patterns of how substituted words are selected.

Then, what will happen is language will cease to make sense as the complexity to communicate secretly increases. The real targets will communicate using literal forms of encryption. There will be layers and hierarchies of metaphors and symbolism, organizations of analogies and relations, similar to the complexity level of fictional narrative literature. It is a combination of abstraction and environmental observation.

Or maybe it is already happening, and it's just a little check mark next to a bunch of check marks that indicate something symbolic and abstract about the future.


Could be made considerably stronger by using a homophonic substitution cipher. Given that there are lots and lots of Unicode characters it would be pretty easy to flatten the distribution to make letter frequency analysis hard.

"The messages are simply a series of letters, relayed in NATO phonetics, and encrypted with a one-time cypher. The receiver would need a code book (by varying reports refreshed daily, weekly, or monthly and inches thick) to identify the key and decrypt the message."

Is there any technological basis for this being cryptographic? I couldn't find any, or am I missing the part where it is an artistic statement. Just knowing the language of the message is enough to use frequency analysis to crack the message.

I did some very rough frequency analysis using this last night, but didn't get very much from it.

The comma symbol is more frequent than any letter usually is in English, but given the small corpus that's not too telling. It could stand for an 'e', or the coded text could be lists and they're just commas.

Someone commented on the article that he suspects the 'divided by' symbol might stand for 'i' due to its placement, which agrees roughly with the position it gets in the frequency table. Someone else has suggested that the language being masked might not be english, which is an intriguing possibility.

The frequencies aren't flat, which seems to suggest it's either not a very good homophonic cipher (he just threw some odd replacments and codeword-symbols in there, basically still a substitution cipher) or it's a very good one (he consciously aimed at misleading symbol frequencies).

The rough nature of the writing (also discussed on the article) suggests that the code was probably memorised, and thus not the result of a very laborous method.


Why? You don’t need to be able to understand a cipher to produce plausible cipher text. You can take headers and footers from real messages and just drop in nonsense between. Or retransmit messages you recorded a few days earlier, or random mixtures of bits of real messages. The enemy still has to transcribe them, sort them out, reject the noise.

I think use of other languages could be described as a form of cryptography. It's been done before as well - see the Navajo code talkers in WWII[1], who were very effective in terms of speed, and who the Japanese found difficult to even transcribe. Of course, this relies on the language being obscure, and many more languages are well documented now than in the 1940s.

>it should likewise be trivial to crack any remaining unknown ancient scripts

That doesn't follow. Part of successfully decrypting a message is knowing when you have the right answer. That doesn't apply if you're looking at a limited dataset (or claytablet set), and you have no idea about the context the texts were written in.

[1] https://en.wikipedia.org/wiki/Code_talker


From experimenting, this looks like a Caesar cipher (https://en.wikipedia.org/wiki/Caesar_cipher) on an alphabet of full (English) words instead.

That’s easily broken if you have enough ciphertext. I guess that, with additional context (who’s talking to who, and what are the most likely subjects) twenty words should be enough.


Didn't Italy nail a mob boss a few years ago because they were using a common substitution cipher?
next

Legal | privacy