Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

But it still gets like 10% of words wrong which is enough to make it unusable


sort by: page size:

it's buggy and accepts bogus words

Also, if I’m seeing this correctly, it only recognises ASCII “words” (a–z, A–Z, 0–9, _), so it’s not even accurate and will over-count words with accents, umlauts, etc.

A lot of it is statistical inference. I've run into weird glitches where it chooses the completely wrong word that still fits. For example I used a sentence that ended with "cool!" but it transcribed it to "excellent!"

Obviously, "excellent" sounds nothing like "cool" but the sentence still worked because it was using the neighboring words to try and guess what should go there.


I get 70 - 75 with around 90% accuracy The word disappearing after a space was throwing me off at first, but I guess there's no one universal interface/UX that would accommodate everyone.

Anything to do with counting words or letters, manipulating letters, etc is just leading to false inferences about the power of the model, because it doesn't work with words and letters but tokens. It can't see words.

The article mentions several times that it only matches on whole words

They use just seven words out of ten so if you rearrange words properly you could get much better results.

In cases like that, it often ignores words even after double quoting them.

I got two slightly misspelled words and it didn't mark them as wrong, so this whole test seems wildly inconsistent(not to mention it seems to have some foreign words and even names marked as real english words according to some other commenters...).

Yeah, I agree. It seems to clip between phonemes and it has weird diction because it doesn't get the stress patterns of the words right.

I was expecting it to at least generate valid words, but that doesn't seem to be the case...

It's fed sub-word tokens not letters (even though it can split a word into letters), and apparently struggles with counting in general. No doubt some of the things it struggles with could be improved with targeted training, but others may require architectural changes.

Imagine yourself trying to use only 5 letter words if you can't see how many letters are actually in each word, and had to rely on a hodgepodge of other means to try to figure it out!


Nah, it's better than that. But it doesn't recognize sentences, and a word appears at a time.

This likely has to do with the tokenization of the sentences; the same reason it's bad at arithmetic.

To be fair though, I would expect that. It's always been and always will be pretty terrible for anything at the character level, because it works on tokens (i.e. chunks of words) rather than characters, and it can't see individual characters at all.

For example, try asking it to count the number of characters in a word and, no matter if it's right or wrong (seems to be a coin toss, btw -- it can just do a rough guess), ask it "Are you sure?", and keep asking that over and over every time it replies. Chances are it'll keep changing the count back and forth, apologizing every time.


It can't do anagrams though (every now and then it might get a common one right but in general it's bad at letter- based manipulations/ information, including even word lengths, reversal etc.).

Is anybody aware of the specific technical reason that it struggles with words so much? There seems to be enough of a pattern to create a million reasonable hypotheses - curiosity makes me want to which it really is!

Looking at the images it's particularly interesting how it seems to have never once gotten the text correct, always just being a little bit off. Well sometimes way off, but mostly quite close.


who's to say the user will not enter an incorrect word or mis-type? To get reasonably accurate results, you would have to take a handful of samples for each word and take the one with the highest frequency. And even then, users will figure out that they really only need to enter one word and random crap for the other one (it will always look different). Not a bad notion but too much room for error and randomness to be reliable.

It would take about 0.5 seconds of brute forcing for the library to figure out if you had gotten a word wrong, so that's actually reasonable.
next

Legal | privacy