>implies that somebody can produce quality results in 50 minutes…
This is more a random fluke as much as sampling bias. Akin to someone either getting lucky or someone who is coming from a very similar environment. This isn’t a measurement that guarantees someone is competent, rather it’s a guarantee that someone is coming from just as incompetent an environment as you have.
This means they heard the difference - it is not possible to be significantly less accurate than 50% without hearing a difference.
For small sample sizes being far of is quite likely, for example the chances for getting at most 1/3 of 31 (11 or less) 50/50 guesses right is 1 in 14.
I think the 50% part should be ignored and the message is "when dealing with an individual sample, the general probability distribution isn't that useful?" I don't actually agree with that, just trying to steel man the point a bit. I do kind of see what he means
I wrote this below, but several things are clear here:
- This isn't a quote and should be taken with a grain of salt. Oversimplification, poor wording, and basic misunderstanding on the part of the author are at fault.
- We don't know what the models outputs are. If they are simply SUCCEED / FAIL, then yes, 50% correct is not very helpful (unless of course it is right more than 50% of the time on big winners). If the outputs are more granular (likelihood of success, expected ROI, etc), then being "right" means a lot less and, to the extent that it does mean something, being right 50% of the time is much more helpful.
Imagine being right 50% of the time guessing about getting through airport security. If you're guesses are "WILL" or "WON'T", then 50% is terrible. If you're guesses are like "through in 23 min 53 sec" then 50% is incredible. If you're guesses are like "70% of being through in 15-20 minutes", what does "right" mean?
> The odds of both of those people entering the same piece of data incorrectly is tiny.
This is just not correct. There may be a systematic reason why they are making a mistake (e.g. a miss-pronounced word) in which case increasing the confidence intervals does not increase the accuracy. Check out the concepts of accuracy, precision etc from physical sciences.
Is it just me or this sentence makes no mathematical sense at all?
"If you’re running squeaky clean A/B tests at 95% statistical significance and you run 20 tests this year, odds are one of the results you report (and act on) is going to be straight up wrong."
That seems like an absurd metric for judging the improvement in a test. It is wrong on both extremes, 1% to 5% is not really a 400% improvement and 99% to 100% is a drastic improvement despite only being a ~1% improvement by this metric.
> However the fact that our first n=1 sample happened to be red (and not something else) gives a small (and varying) amount of confidence towards red-heavier mixes rather than the red-scarce ones.
I wouldn't characterize this as a small amount of confidence, as conditional distribution of the mix-rate after the first sample drastically differs from the prior.
Originally each mix-rate has 1/101 probablity. After the sample having a mix with n reds in it has the probablity 2n/(100101).
50% means we can't "accurately" identify them at all. The article mentions that it is effectively like a random coin flip, but the title is misleading.
And it certainly does not imply "one chance in 100 the result is a fluke". Will science journalists never learn how to interpret a p-value into English?
This is more a random fluke as much as sampling bias. Akin to someone either getting lucky or someone who is coming from a very similar environment. This isn’t a measurement that guarantees someone is competent, rather it’s a guarantee that someone is coming from just as incompetent an environment as you have.
reply