Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Isn't this just called user testing? Also this is in the context of a fucking dataset. If data needs to go through DI in case something blows up on Twitter, I guess it's sad state we're in.


sort by: page size:

Thanks for the comment :D

> There is no “testing” going on here, unfortunately. You’re just taking turns placing users in the treatment or control group, arbitrarily, depending on when they visit the page. How can you measure any treatment effect when everyone is part of either group?

This is not a perfect solution, but I have found that it is good enough to be able to identify clear winners (if there is actually a winning version). A lot of followers won't visit your profile multiple times anyway. They visit it, and they either follow it or they don't - and they will of course be influenced by the currently displayed version :). They won't just come back to your profile over and over again for no reason, but if they do, they will also convert better on the version they prefer. So a version might nudge the user into following you while another one might not. I would disagree that this is not testing. I think it is for the majority of the profile clicks you receive.

> I guess you could try something like switchback testing, but I’m not convinced visits to the average Twitter profile will yield enough samples.

I don't think that would be possible with the current Twitter API capabilities anyway.

> I think it’s a well-executed idea, but I don’t think it’s fair to sell results under the guise of statistical validity when they don’t appear to have that. (although it’s just Twitter profiles, not eg medical treatment, so no real harm done)

While the results might not be perfectly accurate, I think they are accurate enough to provide value, especially if you let the test run long enough to get a big sample size. I personally use Birdy (obviously :D) and I have noticed much better conversion, which is why I'm confident.

I'm looking forward to seeing new capabilities appear on the Twitter API to always make the process more accurate though.


Did you see when they tested it on Twitter users?

At least now we know. The sample size of the test was like the whole twitter user base and seemed to be quite negative but I imagine if they are considering it a bad idea then they must have some real data to now back it up. Cool

There are legitimate reasons for a test like this to exist, the linked author talks about this more a bit further into the same Twitter thread: https://twitter.com/Foone/status/1475254812816019456

whatever will show up in this repo, I hope people realize that depending on what data you put into some algorithm you can get whatever output you want, and twitter is never going to (and neither can or should they) publish everyone's personal information and interaction on the site.

So I'm not sure what the ultimate point of this exercise is other than producing faux-transparency.


Maybe there should be IRB-style oversight for Twitter, etc..., to tweak their algorithms. They are, after all, experimenting on humans here.

yeah, exactly.

also the skeptics are really hung up on semantics of "intelligence" and not addressing the model's output, much less where this is going to be in upcoming years.

like, take home coder test is probably just dead. today.

i mean: https://twitter.com/emollick/status/1598745129837281280


We know that quality data is king, with all due respect to people tweeting, that data is most likely garbage.

How? Twitter's data is already publicly available.

Does anyone else get the feeling this is some kind of experiment like the 'Follow me on Twitter' tests?

This seems like an A/B test to find out if there is enough demand to open up twitter or improve twitter for some specific purposes.

This wouldn't bother me at all if I didn't think they still had the data. The transient nature of tweets is one of my favourite features of Twitter.

I'm far from a data scientist. I'm just someone that has an interest in building UIs and saw a potential to look at data in a way I hadn't seen before. I'm not actively working on this right now, but if you'd like to drop a line for whatever reason, I still check my Twitter for messages when they come up.

https://twitter.com/mmabetsharp


Correct. In internal Twitter jargon, some data is "perspectival" and some for performance reasons isn't. Actually viewing a tweet is calculated on the fly based on your personal perspective, as honoring privacy settings, blocks, etc, is crucial. But that's not true for counts, so those will be off.

People who find this a shocking and objectionable sign of bugs are generally people who have not build software at such large scales.


These come up a lot on data science twitter. I never understand who the target audience could possibly be.

It's a fair criticism, but the nice thing about analyzing Twitter is that the data is already there. Polling has its own set of issues - how do you randomly and uniformly sample from the pool of programmers? It's probably not something you could do for a quick fun blog post.

Twitter should be paying attention to this experiment.

Everyone keeps quoting this acting like it validates the sense of entitlement everyone feels they have to all of Twitter's data.

There are many twitter datasets that are already 50% gone. It's very bad for reproducibility
next

Legal | privacy