Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> Are they charging too much?

The total ownership of virtually all the data for a huge percentage of the population could be argued to be a high price, depending on your perspective.



sort by: page size:

> Your data is worth a lot

Ehhhh, not so sure about this.

• Your individual data is worth a lot of money in aggregate with the data of thousands/millions of other users, but it is difficult (impossible?) to exchange your individual data for money

• If you're talking philosophical/non-monetary "worth" then this varies from person to person. I value my data almost not at all.


> Really interested in what the data cost will be.

What do you mean by the data cost?


> this is more like 1% of the data for 5% of the cost

And quality of data is unknown. If they collect if from some other low price providers, data quality can also be low.


>Why do you think this?

Because people with that much data know how much it costs to buy hard drives to store that much data.

Ask any of them how Google was supposed to make money off of them as a customer and watch them go for a gold in mental gymnastics.


> I have quite a few data points

Show us the data points or it didn't happen.

I am highly sceptical about your statement and am convinced that most people are willing to pay in exchange for ease of use and convenience.


>I think you're over-estimating the amount of data involved.

The one billion figure was from the article.


>As a data source for academic study it's unparalleled.

Does that actually mean it's valuable? Just because something's one of a kind doesn't mean it's worth that much.


> Are you trying to imply that brokering consumer data is a marginal or non-existant business?

For small/mid-size datasets (which is what we're talking about), yes, that's exactly what I'm implying. It's not actually easy for most companies to sell user data for a quick buck like is being claimed.


> Data is so cheap that it should be actually free, unlike counterfeit nike shoes.

Getting data is not cheap, and maintaining a dataset is certainly not cheep either.


> Paying you would mean they'd have to make more than they pay you from the addition of said data.

> This data is probably only valuable in aggregate to determine trends.

I disagree with this data point only because their current profits are being generated because they do not have to pay their data producers. And while I agree with the sentiment that the data is only valuable in aggregate, this doesn't mean my data is worth $0.00. Its still worth pennies, so maybe instead of $20, I earn $5, or less. The point is I'm paid for it. An analogy I had while I was driving over this comment is the equivalent to, say, an orange tree. A single orange is not as valuable as an "aggregate" of oranges.

In the current data economy, companies have maintenance expenses for their "orange harvesters", but not for the oranges themselves. Again, this has often brought up points of "you are using their service, its their data", but I'd say a) this bill clearly shows that some people do not agree with that sentiment, and b) (to continue with the farm analogy) this is where something like share farming [1] would be a usable model.

[1] https://dairy.ahdb.org.uk/technical-information/business-man...


> Data is so cheap that it should be actually free, unlike counterfeit nike shoes.

This really depends on the data. In pharma the right 50 bytes of data can be worth billions. Not all data is personal product preferences for add targeting.


>Data collection is not going anywhere, so long as people are willing (even unknowingly) to give up info for a perceived discount.

People are willing in large part because they have no clue what their data is actually worth and/or what pieces of their data are actually out there. Data collection survives because people don't realize they're already paying $5.99 a month (or whatever the real breakeven number is.)

I think it's a pretty good rule of markets that people should know what it is they're exchanging. To that end I should be able to see what these companies gather on me when I use their service.


> What problem does that solve?

It makes the data use much less profitable for the companies involved, and therefore far less attractive, discouraging business models that rely on it from the start.


> - Most data isn't big. I can fit data about every person in the world on a $100 Chromebook. (8 billion people * 8 bits of data = 8GB)

Nitpick but I cannot help myself: 8 bits are not even enough for a unique integer ID per person, that would require 8 bytes per person and then we are at 60GB already.

I agree with pretty much anything else you said, just this stood out as wrong and Duty Calls.


> Your proposed solution gives companies zero incentive to minimize data collected

The expense of producing the data isn't supposed to be an incentive to minimize data collected, that is what the other sections are for. For large companies it wouldn't be regardless because they could automate the process.

> and even incentivizes them to make fulfilling such a request as convoluted and expensive as possible.

A company charging outrageous prices would obviously invite investigative scrutiny.


> The data they have access to is increasing by orders of magnitude.

You can only go so far by dumping more data into it. Diminishing returns.


> Are they harvesting all this data to sell to third parties without my knowledge?

Yes. It's a secondary revenue stream. Even if they don't sell it now, if it turns out they have a reasonable good dataset they can sell it later on, especially as the revenue starts dropping.


> A key challenge: very few labs have enough data.

It is also getting harder, not easier, to get.

I am working right now on a retro synthesis project. Our external data provider is raising prices while removing functionality, and no one bats an eye. At the same time our own data is considered a business secret and therefore impossible to share.

As someone who does NLP research where the code, data and papers are typically free, this drives me insane.


> I think it’s worth mentioning that most of these problems only occur at a scale that only top 1% of companies will reach

I'll echo what another commenter said. Tons of data != tons of profit.

Tons of data just means tons of data.

Source: Worked on an industrial operations workflow application that handled literally _billions_ of records in the database. Sure, the companies using the software were highly profitable, but I wouldn't have called the company I worked with 'top 1%' considering it was a startup.

next

Legal | privacy