Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

They reported earlier numbers that showed about 79 % effectiveness even though more data had come in in the meantime and overall effectiveness had dropped to about 69 %.


sort by: page size:

That's not what the numbers say. It's on the order of low dozens of percent improvement, not hundreds of percent.

How big was the dataset that you saw the 80% improvement? Remember this test was done on a relatively small dataset, 28 MB, so it may have reached a diminishing point a lot faster than with a larger dataset.

Where does it say they performed 33% better? The only mention of 33% is:

"the reduced variance reached statistical significance in >33% of individual N-back trials comparing DLPFC stim with DLPFC sham"

which means something completely different. I think this study might be terribly overblown.


Results per hour improved, but did total results improve? I think you're implying they did, but you kind of stated it wrong.

> is supposed to addrss that precise limitation, but the results show that it just doesn't do very well at all.

they demonstrate improvement from previous 5% to 7%.


Still, it’s some kind of weighted average, right? Like that 3.5% drop seems to say more about the test data used than the model performance per se.

The graphs tell a slightly different story. I was hoping to find their explanation about why there's no significant difference when clearly TRE was better for about 6 months and then leveled off slightly but still about 25% more effective in the end.

You literally posted the part describing how much effectiveness has dropped with variants.

looks like they overshot by around 50%

Anyone able to explain what the new analysis was that led that existing data to be interpreted so differently?


Relative decrease in WER is not so significant for lower percentages. How about "we make 6 errors on 100 words but Kaldi makes 7".

>In the past three years we have dropped error rates by a factor of 3

What are you referring to here?


I think you’re mixing effectiveness and efficacy. You’re both correct but talking about different terms minus that the parent comment needs some prior adjustment in the numbers to get the posterior.

EDIT: Just confirm, I think I was incorrect in the message below.

I'm not completely sure I'm correct (please correct me if I'm not!) here but as I understand it the article does not support the claim in the headline. The headline claims that signups increases by 28% in the changed version and that this was all attributable to the change.

It's the second bit that isn't supported, they say that the result was statistical significant but what I understand them as saying was that it was statistically significant that the new variation was better than the control. But it could be better by an amount more or less than 28%, all we know from that is it's almost certainly (95%) at least a little better. We would need to know the number of trails to be able to get a certainty for the amount of improvement.

Could someone with a slightly better understanding of statistics chip in maybe? I could use some more information in my own A/B tests, sometimes I know a change is going to be a pain to maintain so I want to know not just if it is better, but by how much.


They mention this in the paper. The chance of all 7 depts changing their reporting simultaneously is pretty vanishing. Much of the paper is dedicated to understanding the cause of the lack of difference between experimental groups, check it out.

Wow! I wonder though how they account for over fitting of the data. Is it a real solution or a statistical anomaly? I ask because it seems that the progress over the last year(s) has been small increments within the 9-10% range.

Interesting that the percentage difference between the winning solution with 107 blended approaches and the close-by solutions using only one approach was on the order of a fraction of a percent in improvement.

Either they did late stage down rounds, or the data is just wrong.

> 62% more effective

This is the thing that is odd to me about this article.

They increased the number of conversations, ok. But are conversations valuable? How about actual conversions (not conversations)?

If conversions remained the same, but conversations went up, it seems pointless.


Fixed mostly to relatively. That wasn't my main point really. I thought the data points were interesting.
next

Legal | privacy