Hacker Read

MagicMoonlight · 2023-03-29 20:01:29

But it still gets like 10% of words wrong which is enough to make it unusable

stefs | karma 1260 | avg karma 2.2 · | 2022-01-26 12:54:42

it's buggy and accepts bogus words

anorakoverflow | karma 41 | avg karma 2.16 · | 2023-08-06 01:10:30

Also, if I’m seeing this correctly, it only recognises ASCII “words” (a–z, A–Z, 0–9, _), so it’s not even accurate and will over-count words with accents, umlauts, etc.

27182818284 | karma 3550 | avg karma 2.06 · | 2010-12-05 23:34:26+00:00

A lot of it is statistical inference. I've run into weird glitches where it chooses the completely wrong word that still fits. For example I used a sentence that ended with "cool!" but it transcribed it to "excellent!"

Obviously, "excellent" sounds nothing like "cool" but the sentence still worked because it was using the neighboring words to try and guess what should go there.

reply

nevi-me | karma 2027 | avg karma 3.13 · | 2019-04-16 12:49:04+00:00

I get 70 - 75 with around 90% accuracy The word disappearing after a space was throwing me off at first, but I guess there's no one universal interface/UX that would accommodate everyone.

carrolldunham | karma 315 | avg karma 1.56 · | 2023-05-02 01:08:00

Anything to do with counting words or letters, manipulating letters, etc is just leading to false inferences about the power of the model, because it doesn't work with words and letters but tokens. It can't see words.

_bxg1 | karma 112 | avg karma 0.03 · | 2020-06-10 05:17:24+00:00

The article mentions several times that it only matches on whole words

sigsergv | karma 423 | avg karma 1.47 · | 2021-08-15 23:40:51

They use just seven words out of ten so if you rearrange words properly you could get much better results.

donkeybeer | karma 346 | avg karma 1.69 · | 2021-07-28 21:59:50

In cases like that, it often ignores words even after double quoting them.

alpaca128 | karma 7759 | avg karma 2.86 · | 2019-06-20 11:36:43

I got two slightly misspelled words and it didn't mark them as wrong, so this whole test seems wildly inconsistent(not to mention it seems to have some foreign words and even names marked as real english words according to some other commenters...).

actually_a_dog | karma 2862 | avg karma 1.7 · | 2021-05-17 13:49:46+00:00

Yeah, I agree. It seems to clip between phonemes and it has weird diction because it doesn't get the stress patterns of the words right.

giosch | karma 82 | avg karma 3.42 · | 2017-10-25 18:23:21+00:00

I was expecting it to at least generate valid words, but that doesn't seem to be the case...

HarHarVeryFunny | karma 2700 | avg karma 1.76 · | 2023-05-20 17:17:18

It's fed sub-word tokens not letters (even though it can split a word into letters), and apparently struggles with counting in general. No doubt some of the things it struggles with could be improved with targeted training, but others may require architectural changes.

Imagine yourself trying to use only 5 letter words if you can't see how many letters are actually in each word, and had to rely on a hodgepodge of other means to try to figure it out!

reply

danuker | karma 6850 | avg karma 2.78 · | 2022-01-19 19:09:33

Nah, it's better than that. But it doesn't recognize sentences, and a word appears at a time.

sdenton4 | karma 9649 | avg karma 3.54 · | 2023-01-04 00:03:54

This likely has to do with the tokenization of the sentences; the same reason it's bad at arithmetic.

Michelangelo11 | karma 4521 | avg karma 6.3 · | 2023-02-19 16:13:45

To be fair though, I would expect that. It's always been and always will be pretty terrible for anything at the character level, because it works on tokens (i.e. chunks of words) rather than characters, and it can't see individual characters at all.

For example, try asking it to count the number of characters in a word and, no matter if it's right or wrong (seems to be a coin toss, btw -- it can just do a rough guess), ask it "Are you sure?", and keep asking that over and over every time it replies. Chances are it'll keep changing the count back and forth, apologizing every time.

reply

wizofaus | karma 1977 | avg karma 0.92 · | 2023-02-13 14:50:42

It can't do anagrams though (every now and then it might get a common one right but in general it's bad at letter- based manipulations/ information, including even word lengths, reversal etc.).

somenameforme | karma 9382 | avg karma 3.86 · | 2023-09-20 14:21:37

Is anybody aware of the specific technical reason that it struggles with words so much? There seems to be enough of a pattern to create a million reasonable hypotheses - curiosity makes me want to which it really is!

Looking at the images it's particularly interesting how it seems to have never once gotten the text correct, always just being a little bit off. Well sometimes way off, but mostly quite close.

reply

spiralhead | karma 58 | avg karma 0.94 · | 2007-05-25 21:02:58+00:00

who's to say the user will not enter an incorrect word or mis-type? To get reasonably accurate results, you would have to take a handful of samples for each word and take the one with the highest frequency. And even then, users will figure out that they really only need to enter one word and random crap for the other one (it will always look different). Not a bad notion but too much room for error and randomness to be reliable.

Taek | karma 7521 | avg karma 4.97 · | 2017-06-16 18:08:10

It would take about 0.5 seconds of brute forcing for the library to figure out if you had gotten a word wrong, so that's actually reasonable.