The total ownership of virtually all the data for a huge percentage of the population could be argued to be a high price, depending on your perspective.
• Your individual data is worth a lot of money in aggregate with the data of thousands/millions of other users, but it is difficult (impossible?) to exchange your individual data for money
• If you're talking philosophical/non-monetary "worth" then this varies from person to person. I value my data almost not at all.
> Are you trying to imply that brokering consumer data is a marginal or non-existant business?
For small/mid-size datasets (which is what we're talking about), yes, that's exactly what I'm implying. It's not actually easy for most companies to sell user data for a quick buck like is being claimed.
> Paying you would mean they'd have to make more than they pay you from the addition of said data.
> This data is probably only valuable in aggregate to determine trends.
I disagree with this data point only because their current profits are being generated because they do not have to pay their data producers. And while I agree with the sentiment that the data is only valuable in aggregate, this doesn't mean my data is worth $0.00. Its still worth pennies, so maybe instead of $20, I earn $5, or less. The point is I'm paid for it. An analogy I had while I was driving over this comment is the equivalent to, say, an orange tree. A single orange is not as valuable as an "aggregate" of oranges.
In the current data economy, companies have maintenance expenses for their "orange harvesters", but not for the oranges themselves. Again, this has often brought up points of "you are using their service, its their data", but I'd say a) this bill clearly shows that some people do not agree with that sentiment, and b) (to continue with the farm analogy) this is where something like share farming [1] would be a usable model.
> Data is so cheap that it should be actually free, unlike counterfeit nike shoes.
This really depends on the data. In pharma the right 50 bytes of data can be worth billions. Not all data is personal product preferences for add targeting.
>Data collection is not going anywhere, so long as people are willing (even unknowingly) to give up info for a perceived discount.
People are willing in large part because they have no clue what their data is actually worth and/or what pieces of their data are actually out there. Data collection survives because people don't realize they're already paying $5.99 a month (or whatever the real breakeven number is.)
I think it's a pretty good rule of markets that people should know what it is they're exchanging. To that end I should be able to see what these companies gather on me when I use their service.
It makes the data use much less profitable for the companies involved, and therefore far less attractive, discouraging business models that rely on it from the start.
> - Most data isn't big. I can fit data about every person in the world on a $100 Chromebook. (8 billion people * 8 bits of data = 8GB)
Nitpick but I cannot help myself: 8 bits are not even enough for a unique integer ID per person, that would require 8 bytes per person and then we are at 60GB already.
I agree with pretty much anything else you said, just this stood out as wrong and Duty Calls.
> Your proposed solution gives companies zero incentive to minimize data collected
The expense of producing the data isn't supposed to be an incentive to minimize data collected, that is what the other sections are for. For large companies it wouldn't be regardless because they could automate the process.
> and even incentivizes them to make fulfilling such a request as convoluted and expensive as possible.
A company charging outrageous prices would obviously invite investigative scrutiny.
> Are they harvesting all this data to sell to third parties without my knowledge?
Yes. It's a secondary revenue stream. Even if they don't sell it now, if it turns out they have a reasonable good dataset they can sell it later on, especially as the revenue starts dropping.
> A key challenge: very few labs have enough data.
It is also getting harder, not easier, to get.
I am working right now on a retro synthesis project. Our external data provider is raising prices while removing functionality, and no one bats an eye. At the same time our own data is considered a business secret and therefore impossible to share.
As someone who does NLP research where the code, data and papers are typically free, this drives me insane.
> I think it’s worth mentioning that most of these problems only occur at a scale that only top 1% of companies will reach
I'll echo what another commenter said. Tons of data != tons of profit.
Tons of data just means tons of data.
Source: Worked on an industrial operations workflow application that handled literally _billions_ of records in the database. Sure, the companies using the software were highly profitable, but I wouldn't have called the company I worked with 'top 1%' considering it was a startup.
The total ownership of virtually all the data for a huge percentage of the population could be argued to be a high price, depending on your perspective.
reply