Hacker Read

jackowayed · 2012-08-01 04:42:43+00:00

How about 256 words (or sets of words) representing each byte value? It's less dense by a factor of ~5, but it would work, be easy to decode, and be very difficult to identify, especially if you used sets of words. You could even cleverly generate in a way that is grammatically correct.

dungle6 | karma 9 | avg karma 0.2 · | 2017-07-22 17:03:52+00:00

The words are 36 bits. 3 12 bit characters (the analogue to a byte)

emddudley | karma 772 | avg karma 2.63 · | 2012-10-13 22:24:31+00:00

I would call that the word size. The word would be 2 bytes of 8 bits each.

_nalply | karma 2714 | avg karma 1.76 · | 2019-04-07 18:05:43+00:00

Another idea: find 256 different syllables and use that to encode 8-bit numbers. This way we also can avoid profanity (except for the rare case where a string of multiple harmless syllables expresses a profanity).

bawolff | karma 19532 | avg karma 3.08 · | 2021-05-07 15:52:53+00:00

We probably don't even need 128 bits of entropy. TOTP 2FA is only 20 bits after all and people seem happy with that.

If we make them be pseudo sentences they will probably be easier to remember (the $adjective $noun $adverb $verb a $adjective2 $noun2)

reply

pronoiac | karma 2333 | avg karma 2.1 · | 2021-08-09 18:30:17

Wow, the nerdsniping on this is something. Note, for my math, I failed to consider that entries wouldn't fit in one byte. Figuring out an encoding might help. Laying out a sequence in blocks might also help overcome the speed difference between memory and storage. I'm not even really trying to delve into this, but it's tempting.

quicktwo | karma 10 | avg karma 0.59 · | 2022-02-20 14:23:37

I think you might have miscalculated bits per bytes here?

8 * 17,763/64,860 = 2.19

Also, I attempted to implement this as described in this paper (variable length encoding the letters and the offsets, utilized L, and dropped F entirely because all words are the same length, N didn't make a big difference).

I achieved a naive size of 20,560 bytes, which I didn't have confidence implementing more advanced techniques outlined in the paper would get the size down sufficiently to compete with using a trie+Huffman representation (15,599 bytes, https://github.com/adamcw/wordle-trie-packing#all-words).

8 * 15,599/64,860 = 1.92 bits per byte.

reply

malisper | karma 2181 | avg karma 4.91 · | 2023-06-10 22:09:22

> 27 bits for just one word seems wasteful

Where did you get that you need 27 bits for one word?

> Then to send any word you only need to send one number, and in binary it would have between 1 and at most 19 bits

Yep! By sorting by frequency, you are able to make it so the majority of words have shorter bit strings. By my calculations, common words such as "the", "of", and "and" will have ~4-6 bits associated with them. That means you can encode a large number of words (googling says those words make up ~1/7 of words based on frequency) with only 4-6 bits each. That's far from the 27 bits you calculated

reply

nico | karma 2695 | avg karma 2.03 · | 2023-06-10 13:24:38

In the article the compression doesn’t make sense

If you are only sending one word, and the recipient already needs to know the word, then you only need 1 bit, essentially just signaling that you are saying that specific word

If you want a richer vocabulary, you could create an index of about 300k words (from the English dictionary), shared between the parties

Then to send any word you only need to send one number, and in binary it would have between 1 and at most 19 bits, for any word in the index (2^19 is around 500k)

That’s without even sorting the index by frequency of appearance/usage

27 bits for just one word seems wasteful

reply

beanlog | karma 524 | avg karma 32.75 · | 2022-05-21 11:55:34

The trick is to encode the numbers in binary, not plaintext. But still you'd probably want to use a Huffman coding rather than plain indexes so that the common words are shorter.

TobTobXX | karma 751 | avg karma 4.13 · | 2023-05-23 10:17:02

Or even more in-your-face:

Encode bits as spanish words. Enumerate 2^8 words in the dictionary and just use them.

reply

jusssi | karma 766 | avg karma 2.06 · | 2024-01-23 08:05:10

We just need to agree on a 2^16 word dictionary, each 16-bit segment gets it's own word. Then we can write addresses such as correct:horse:battery:staple::one .

ben0x539 | karma 4321 | avg karma 3.33 · | 2019-03-28 13:14:46+00:00

128 bits is like, what, ten words chosen randomly from a suitably large dictionary? Is that really prohibitive? Even as random alphanumeric string that doesn't seem beyond muscle memory. Seems like the biggest pain would be to input it consistently on a phone keyboard.

jameshart | karma 19910 | avg karma 4.65 · | 2024-01-08 11:51:38

24 bits seems about right for the information content of six Latin characters arranged in a pronounceable English orthography (the ‘X’ has pretty high information value though).

gfody | karma 1466 | avg karma 2.13 · | 2019-04-07 21:51:09+00:00

or find 65,536 words and use them to encode 16-bit numbers, then you could leverage your existing word associations to memorize

kazinator | karma 30751 | avg karma 1.78 · | 2023-06-07 17:52:55

I would throw out all obscure languages and useless symbols.

One character = one code point.

I would use 32 bits for a code point. The lower 16 bits would be code; the upper 16 bits contain some flags and fields for classification. This would allow simple bit operations to inquire about important properties. There would be a 4-bit flexible type code, leaving 12 bits, some of which would have a code-specific meaning, others fixed. Or something like that.

The goal would be to have a code that programs can work with, without requiring megabytes and megabytes of meta-data about the code.

reply

collegeburner | karma 1807 | avg karma 1.57 · | 2021-09-16 16:53:07

There's a bitcoin wordlist that can encode 2^11 bits per word, so 3 words could hold a 32-bit hash. You'd need a pretty large wordlist for 3 words to hold 128 bits.

michaelmrose | karma 10452 | avg karma 1.57 · | 2023-04-18 11:42:38

You are confusing the fact that you can store in some encodings one character per byte, 8 bits, with a bit of entropy.

BurningFrog | karma 23526 | avg karma 2.88 · | 2016-12-03 18:01:55

OK. That's the super optimized way to read the encoded data. Makes sense if performance is important.

Writing is the harder part. I suppose you could just make 15 such tables, one for each possible value you could write.

reply

nitrogen | karma 22058 | avg karma 2.02 · | 2012-04-05 19:52:24+00:00

I'm not the OP and I don't have a solution at hand, but I'd start by thinking about error correction and parity, then realize that 10 bits is enough to represent 1000 items completely.