Why the Facebook Experiment Is Lousy Social Science

gwern | karma 33755 | avg karma 4.24 · 2014-09-28 16:54:08

I wrote a comment debunking OP's arguments, but it was too long for a HN comment and my noprocast settings kicked in so I can't post it as two comments; so I copied it over to G+: https://plus.google.com/103530621949492999968/posts/1PqPdLyz...

tl;dr: Let's summarize his complaints and my counter-objections:

1. no consent: irrelevant to whether this was good science or 'lousy social science' 2. crossed boundaries between corporations and academia: likewise irrelevant; also, welcome to the modern Internet 3. small effect size: misunderstood the statistical design of study and why it was designed & expected to have small effects 4. used LIWC with high error rate for measuring emotionality of posts: if random error, biases effect to zero and so is not an argument against statistically-significant findings 5. and LIWC may have systematic error towards positivity: apparently not an issue as negative & positive conditions agreed, and the studies he cites in support of this claim are mixed or unavailable 6. also, other methods are better than LIWC: sure. But that doesn't mean the results are wrong 7. maybe LIWC has large unknown biases applied to short social media texts: possible, but it's not like you have any real evidence for that claim 8. Facebook news posts are a biased source of mood anyway: maybe, but they still changed after random manipulation 9. experience sampling is sooooooo awesome: and also brings up its own issues of biases and I don't see how this would render the Facebook study useless anyway even if we granted it (like complaints #1, 2, 6, 7)

Now, I don't want to overstate my criticisms here. The author has failed to show the Facebook study is worthless (I'd wager much more money on the Facebook results replicating than 95% of the social science research I've read) and it would be outright harmful for Facebook to aim for large effect sizes in future studies, but he does at least raise some good points about improving the followup work: Facebook certainly should be providing some of its cutting-edge deep networks for sentiment analysis for research like this after validating them if it wants to get more reliable results, and it would be worthwhile to run experience sampling approaches to see what happens there, in addition to easier website tests (in addition, not instead of).

reply

kevingadd | karma 14699 | avg karma 2.67 · 2014-09-28 19:34:22+00:00

I don't feel like you make an adequate case for some of your counter-objections.

#1 is particularly troubling since you seem to argue it by misrepresenting his argument instead of addressing it directly. The alternative is to perform the experiment w/informed consent, not to keep the results of your unethical experiment secret. I think this is actually an issue that is important when considering whether an experiment is 'good' or 'acceptable'. Tuskegee is what I think of when people try to argue that informed consent is unimportant. Even though anyone can tell you that the stakes in this kind of experiment are incredibly small in comparison, there IS a measurable risk of harm when manipulating emotions, so ethics matter here.

#2 - you seem to make no effort to address this either. To me, it is particularly troubling if the business and academic sides of FB freely interacted here because each side has an incomplete understanding of the other side's motives and priorities and constraints. The business side is focused on revenue and actionable metrics and legal liability (that's their job), while the research side is naturally going to have different objectives and have a perspective that is more rooted in the practice of good, ethical science. There's real trouble if research ethics & principles end up subjugated to the interests of the business people (which they often do).

Seeing you utterly dismiss the first two points with 'Oh good, so the author isn't a complete idiot.' is not exactly a revelatory argumentative triumph. You can do better.

reply

gwern | karma 33755 | avg karma 4.24 · 2014-09-28 22:57:25

> #1 is particularly troubling since you seem to argue it by misrepresenting his argument instead of addressing it directly. The alternative is to perform the experiment w/informed consent, not to keep the results of your unethical experiment secret.

No, the alternative is to go through a long rigmarole which may or may not approve the experiment and enforce conditions which may or may not themselves constitute severe biases of their own (knowing one is in an experiment, even if the subjects have been deceived as to the content, is itself a problem).

> #2 - you seem to make no effort to address this either. To me, it is particularly troubling if the business and academic sides of FB freely interacted here because each side has an incomplete understanding of the other side's motives and priorities and constraints.

There's nothing to be addressed here. Businesses have always allied with scientists when they find common interests, from astronomers preparing tables for shipping to Gosset inventing a good chunk of modern statistics for optimizing a beer brewery. All he's done is talk vaguely about 'boundaries' and insinuate dark things.

> Seeing you utterly dismiss the first two points with 'Oh good, so the author isn't a complete idiot.' is not exactly a revelatory argumentative triumph. You can do better.

Those points are fundamentally irrelevant to the overarching claim he makes that it's bad science. I think he's wrong about the more... philosophical aspects of things, but I don't have to show he's wrong - he has to show they materially affect the truth of the results. He has to show that #1 and #2 actually matter, and he hasn't. Instead, he spent an incredible amount of verbiage on vaguely related topics and he flubbed the technical criticisms. I don't give him a pass on that and neither should you.

reply

yid | karma 3322 | avg karma 4.8 · 2014-09-28 23:21:55

> No, the alternative is to go through a long rigmarole which may or may not approve the experiment and enforce conditions which may or may not themselves constitute severe biases of their own (knowing one is in an experiment, even if the subjects have been deceived as to the content, is itself a problem).

Having gone through some IRBs, there is a good reason and enough historic precedent to warrant this "long rigamarole". The point is that you can't simply say "but everyone on the Internet does it" and expect people who support traditional research ethics not to say "wait a minute, that shouldn't make it OK".

reply

gwern | karma 33755 | avg karma 4.24 · 2014-09-29 18:37:30+00:00

The circumstances which prompted the formation of IRBs would not have been solved by IRBs: IRBs would not have stopped the military from harming downwinders, would not have stopped Nazis, etc. That's all one needs to know.

qu4z-2 | karma 878 | avg karma 1.01 · 2014-09-29 00:00:42+00:00

Regarding your point about the percentages, the OP states that between 10-90% of positive or negative posts are removed, depending on the group. Looking at just the positive posts for a moment, if around 50% of the posts in the news feed are classified as positive (46.8%), surely removing 10% of those is precisely the 5% of posts you claim is different. Note that the percentage removal is based on the uid, so the range of posts removed would be between about 5 and 45%, depending on the poster. The 4.68% quoted is for a user on the extreme low end of the removal range, and is percentage of total posts, not positive posts.

I don't see how this is different from the claimed 10-90% of positive posts. Similarly for negative posts. Am I misunderstanding something?

reply

gwern | karma 33755 | avg karma 4.24 · 2014-09-29 18:38:39+00:00

The OP may or may not have misunderstood it depending on how cleverly you parse what he wrote, but he presents it in a way that no reader would be able to understand, compared to the extract from the study which actually does specify the final numbers so one understands that it was a small intervention.

kylelibra | karma 2100 | avg karma 4.75 · 2014-09-28 16:55:43+00:00

The uproar over this particular experiment seems like a bit much, but it doesn't hurt to start to have these conversations about research on users using the web. There is only going to be more of it going forward.

aroch | karma 3556 | avg karma 2.83 · 2014-09-28 17:44:53

Purposefully experimenting, on a large scale, with peoples emotions when they don't know about it to try and make them sad or angry is a big deal. The Facebook paper looks at whether negative posts cause people to be more negative. What happens if someone with depression, who visits Facebook to vicariously share in others happiness, is placed into the negative-contagion test group? Is Facebook responsible for any harm that results?

There's a reason why psychology studies (any human study really) require informed consent. There is potential for great damage and Facebook apparently doesn't care.

Now I can hear someone saying that advertisement is the same thing. Advertisers are trying to change peoples' emotions, yes, but we know and recognize that while watching an ad. People using Facebook do not expect to be manipulated like the study did.

reply

res0nat0r | karma 6030 | avg karma 2.53 · 2014-09-28 18:44:11

You need to be upset about any online company that does A/B testing also then without consent. Does this new button make people happy and want to subscribe to our content vs. our old design? This outrage is overblown and really only exists because it is from Facebook.

aroch | karma 3556 | avg karma 2.83 · 2014-09-28 18:57:16+00:00

A/B testing is not the same thing as manipulating content based on negative/positive connotation and tracking user emotion change as a result. They're conceptually similar but the resulting change in user expression is not. A/B testing is optimizing your site and user interactions with it, Facebook's experiment is trying to change a users interactions with the world.

I would be just as upset if it were Google or my local grocer.

reply

res0nat0r | karma 6030 | avg karma 2.53 · 2014-09-28 19:13:46+00:00

Sure it is. All advertising is manipulating user emotion, and A/B testing different advertising methods / button layouts on your site to improve conversions is one such use.

If this is something that upsets you I suggest being upset at every commercial on the radio / newspaper / tv / billboard that you're unfortunately exposed to every day.

reply

aroch | karma 3556 | avg karma 2.83 · 2014-09-28 19:22:44

People know ads are trying to manipulate them, they don't know that Facebook is actively trying to make them feel happier or sadder. Just because two things are similar does not make them morally equivalent.

People have been calling out various bad practices in conversion: http://darkpatterns.org/

reply

res0nat0r | karma 6030 | avg karma 2.53 · 2014-09-28 19:47:42+00:00

Emotional manipulation happens every day in all kinds of fields other than advertising, I wouldn't refer to all of these as "dark patterns."

kaoD | karma 3508 | avg karma 2.35 · 2014-09-29 01:27:10+00:00

> Emotional manipulation happens every day in all kinds of fields other than advertising

So it's not bad because everyone else does it?

reply

teaneedz | karma 508 | avg karma 1.26 · 2014-09-29 02:00:10

At the end of the day, it does boil down to how users feel, but legit A/B seeks better features, improved user-experience, or identifying something better. The tech community, developers and data folks need to understand that all A/B is not equal. If we allow FB (or OKC) to pass off their ill-thought psych experiments as normal A/B or similar day-to-day product testing, we lower the UE bar. It has also been pointed out elsewhere that this particular study was/is illegal, violating certain state laws on Common Rule practice regardless of funding source. I hope that board members and the C-suite take a new leadership role to educate their teams as to what acceptable and legal A/B should look like.

thisGuysAccount | karma 29 | avg karma 0.49 · 2014-09-28 19:14:05+00:00

I agree that A/B testing is not the same thing.

A real-world comparison of the two different types of experiments:

Grocer wants to see the effect of a product demo's effect on sales. They have a worker offer a product sample, then recommend a pairing of products vs offering a coupon for that product. A/B testing.

Grocer wants to see the effect of a product's demo on the customer's emotions. He has the demo worker gripe about their boss, or talking positively about the community, then asks the cashiers for followup feedback on the smalltalk at their counters.

One of these is OK. The other is seedy.

reply

Ar-Curunir | karma 3784 | avg karma 2.0 · 2014-09-28 19:42:16+00:00

Sure, but the problem here is that this study blurred academic and corporate boundaries. Studies I've conducted involved telling the participants after the study that they were being recorded. Immediately after.

It is a requirement instituted by the IRB in most universities. That is the problem being mentioned here.

Furthermore, Facebook is a different case from most companies because it is an intrinsically personal website that has the potential to affect a user's life very meaningfully. Amazon changing the design or location of a button doesn't affect me as much.

reply

gpanger | karma 7 | avg karma 1.17 · 2014-09-29 02:22:16

Hi guys, author here, thanks for taking an interest in my critique of the FB experiment. Quick FYI is that the post is up on Medium:

https://medium.com/@gpanger/why-the-facebook-experiment-is-l...

...where I can fix things like broken links. It was republished on my school's website, but unfortunately I don't have control over the HTML there. In the cases where stuff broke, it's because I linked to the author's personal manuscript and not the official journal page (because the former is freely available to everyone rather than behind a paywall).

I care a lot about Big Data research, especially involving social media, and think we too often ignore the conceptual leaps required to make inferences about the human experience from social media.

Here, the leap is: sentiment_analysis(what people say on social media) == how they really feel.

The point about LIWC (the sentiment analysis tool used here) is that (a) it's flawed, and perhaps not in a "random error" kind of way, (b) we don't really know how well it works because it has not been validated in social media or Facebook posts specifically, which should make most researchers nervous (but somehow doesn't), and (c) there's evidence from other data sources that suggests LIWC is biased in a way that would underrepresent the emotions of interest (namely low arousal emotions like sadness, depression, loneliness, feeling left out, etc.; these aren't picked up by LIWC as well as other negative emotions like anxiety).

See e.g.:

O’Carroll Bantum, E., & Owen, J. (2009). Evaluating the Validity of Computerized Content Analysis Programs for Identification of Emotional Expression in Cancer Narratives. Psychological Assessment, 21, 79-88.

The point isn't that using LIWC means the experiment is invalid, the point is that it should give us pause and caution us against stating the conclusions of the experiment too strongly. I think the authors do state their conclusions a bit strongly.

The other main critique is about biases inherent in social media as a datasource itself. The private, randomly-solicited emotion samples of experience sampling are more likely to capture Facebook's true emotional impact than the non-private, self-selected emotion samples of status updates. Let's just take arousal bias. If we know that people are more likely to post when they're emotionally aroused (excited, angry, fearful/anxious), but the emotional consequences we're concerned about involve low arousal emotions (sadness, depression), then there's a serious chance we'll miss exactly the emotions we're arguing don't exist. That's a big problem.

I think we're a bit too enamored with the idea that Big Data provides an unbiased window into the human experience. I think a tremendous amount of social science would argue otherwise.

Stepping back a bit, the Facebook experiment raised many interwoven issues for me, which is why they featured in the broader piece I wrote. About Facebook's culture, about ethics in corporate vs. academic research, about Facebook's emotional impact (foolish to believe there is none, I think), about how we use Big Data in research, about how we cope with Facebook's presence in our lives, for better or worse.

You may have a great experience with Facebook, and that's great. Others struggle with the medium. I mention social comparison not just because it's been the focus of research on social media, but also because the authors of the experiment bring up social comparison (as well as the "alone together" argument) in their work. Because they try strenuously to rebut the unfavorable findings about Facebook's emotional consequences, I thought it was important to point out that their study seems designed in a way that would systematically underrepresent exactly those negative emotions they're arguing against.

Certainly, some negative emotions were "contagious" through social media (anxious news reports, for example), as were some positive emotions. But is the emotion that gets retransmitted the full emotional picture? Probably not. Probably many emotions and feelings get withheld. The social science would suggest that when positive posts make us feel bad, we won't go back on Facebook and broadcast those feelings to all of our friends.

Thanks a ton for engaging with my critique and responding with your own. Writing that was a labor of love, and I learn a lot from the feedback, good and bad. Sorry it was so long.

reply