Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Even the method of contact would skew toward people who have a means for the method of contact or location. Cell phones might include less seniors. Questioning people walking into Whole Foods is going to give you a different set than people walking into Cracker Barrel.

If the source set of names and numbers is truly representative, I think random calling could work if you make sure to include "did not response" as a line item. Then you might get something like:

> 45% Biden, 39% Trump, 1% Jorgensen, %15 ???

With the no-response removed you get a skewed:

> 52% Biden, 45% Trump, 1.1% Jorgensen



sort by: page size:

Sure, phone surveys for political purposes (presidential approval ratings etc.) have to deal with that all of the time. There are methods for estimating non-response impact. [0] One method of mitigating it that I've seen it to reach out again to non-responders. You then analyze their results to see how they differ from the baseline responders to estimate the non-responder population. If there's little/no difference, you can be fairly confident the risk of bias is low. It's called non-response follow up, and is a pretty common method.

There's also literature that suggests that you don't discard outlier values in the actual responders as they may help approximate the non-responder population, i.e., the non-outliers represent typical responders while outliers are more likely to represent non-responders [1]

[0]https://www.warc.com/content/paywall/article/jar/research_no...

[1] https://bmcmedresmethodol.biomedcentral.com/articles/10.1186...


They don't go out and canvass for phone numbers first. If they don't include cell phone numbers it's possible to bias more towards older persons, but that wouldn't invalidate the survey either.

My survey shows that 100% of people answer unsolicited phone calls.

Before running with missing data, you need to make the case that there's no plausible reason why non-respondents would have different answers from respondents.


Also says it's an "online survey." So it has a self-selection bias. In order to be useful, the survey would have to be from a random sampling of people. Calling folks on the phone is the best way to accomplish this.

Could this be something where contacting non-responders or sending out additional questionnaires could skew the results? Maybe non-response is considered part of a poll and by fighting against that you would somehow be putting your thumb on the scale?

I'm not an expert either in statistics or in polling; so I'm just speculating.


A truly "randomly sampled" phone survey would skew to a very old demographic, and successfully recruiting younger generations with this tactic is a notorious challenge (online works better).

What they don't tell you though is that it's not truly random as claimed. They recruit a representative sample of respondents. So there are young people in the sample, it just takes more calls to get the data.

Source: I run these surveys


Phone surveys were accurate when robocalls and cellphones didn't exist. Now you're only sampling the people who aren't discerning enough to reject unknown numbers.

It would be so much more effective if people they polled represented a uniform distribution (or any known).

It would be so much more effective if people they polled represented a uniform distribution (or any known).

Suppose the non-responders are identical to the responders. Then nothing is lost by trying to get them to participate again. But if they are not, replacing them will lose information.

> ill, busy, distracted by personal issues, absent, drunk

But being ill correlates (lightly) with being older, and being drunk heavily correlates with opinions. Being absent might correlate with wealth (and not even linearly). Personal issues probably correlate with education. Add them together, and you've got a fairly large part of the population missing from your sample.


The idea with A is that you will get a more representative sample if you follow up specifically with the people who did not initially respond. Yes, some will never respond, but the ones that do answer will increase the quality of the data, moreso than an additional random sample. The people who did not initially respond are more likely to be busy/lazy/unengaged, and without as many responses from that cohort, the data will be skewed.

The response rate is so low as to be entirely meaningless. If you're content with sampling only 0.000001% of the population, you can get any results you want by simply selecting for the people who care the most by taking a bunch of time to lecture the individual if they seem like they'd vote the other way (oh no, you have to leave? but we didn't even have time to submit your vote yet... :( ), and wording the question in such a way to make the answer you want be the only reasonable response.

The Census uses this technique when doing random samples. One of the ways to improve accuracy is to put a lot of effort into contacting a random sample of non respondents which simply isn’t viable at scale.

How are you sampling people? Unless you track people down and knock on doors and have them submit a sample right there, it won't be random, but a subset of people willing to take time out of their life to participate in a research study.

I think the worse problem would be similar to that political pollsters (who are supposed to only call landlines) are having in the age of cell phones. What was once a method for getting a relatively representative sample of the population is an increasingly biased one. Correcting for the differences can be done, but as time goes on, the uncertainty of the corrections grows. Truth and perception diverge.

To what degree do these results reflect the likelihood of responding rather than the actual demographics? I actually think younger people may be slightly more inclined to respond.

I don't buy that, and even if I did the flip would be worse if you look phone contact rates.

I've seen some good very large digital surveys. We do some - geared towards measuring ad recall/lift - and I find them valuable, especially a giant sample on only two recall + head to head questions

You can always ask for gender/age and weight it but that's part of what I see as the problem. so many 'traditional' surveys i see make fairly large adjustments, lots of 'looking back' at past turnout and over fitting based on personal bias. Especially when weighting from very small sub-sample xtab e.g. hispanics. im not a pollster and that's just my still fairly-insider / polling adjacent insight

Another problem with digital surveys is most of the firms do opt-in panels, like having people register to take surveys and get paid on mechanical turk. and then they weight from there. I think this is a problem. The surveys we do run inside of mobile ads, similar to Google / FB Brand Lift surveys they go after those who saw the ads or a truly random sample, not a small biased group of survey panel


I bet you could model the likelihood of this, seeing if the respondents are enriched for families who are likely to be in communication with each other to some degree (family, friends, neighbors, coworkers, in the same school, etc) or if its not significantly enriched compared to random chance.

If they randomly chose 34 people to poll it would be decent sample. But now it's not a sample at all. The responders are not representative of the non-responders.
next

Legal | privacy