Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
U.S. judge says LinkedIn cannot block startup from public profile data (www.reuters.com) similar stories update story
779.0 points by techrush | karma 356 | avg karma 89.0 2017-08-14 21:04:18+00:00 | hide | past | favorite | 301 comments



view as:

This seems very at-odds with previous rulings (specifically, relating to craigslists many past dealings). Strikes me as being very unlikely to stand up to appeal. Also, linkedin will likely modify their websites behavior (make you click to agree before you view a profile) which would create a binding 'click wrap' stopping companies from scraping them.

That click wrap contract is kind of an interesting thing on it's own, for those of us who only enable JS when absolutely necessary. If I never see the agreement, and I am not specifically avoiding it, does it still apply to me?

Did you click a button saying you agreed to it?

Make the profile loadable via XHR and problem solved. For example.(Which I bet is already the case)

There was a big ruling in Canada about this specifically around MLS, the big real estate monopoly we have, so that if you go to their sites to search for homes, like you'd see at Realtor.ca, you have to click through a clickwrapper to access any data, and even if you automate past that, the fact that a human would have to click it means that it's illegal to scrape since you are forced as a human to agree to a TOS before you view.

Ah yes, that. MLS compliance was a source of many tickets for me, in a previous job, in Canada. The employer didn't even want me to waste time trying to learn it all, just follow the compliance officer.

IIRC, this stuff varies quite a bit from region to region, even within a single metropolitan area. Attempting to simultaneously comply with multiple independently developed rulebooks was ... fun.

I can't wait for shipyard startups to disrupt the housing market. /s


Appeal? The case hasn't even been heard yet. This was a preliminary injunction; it's far from over!

A preliminary injunction can be the subject of an interlocutory appeal.

In this instance, however, it is not (as can be gleaned by reading the article).

It's just a preliminary injunction used to maintain the status quo (ie, allowing scrapers) while the case is heard. A preliminary injunction is basically "ok everyone stop what you're doing, maintain business as usual until the court rules."

The biggest reason why they have not done this so far is SEO. If you introduce the 'click wrap' - other crawlers like Google won't be able to crawl it, so their traffic will decrease overnight.

They'd most certainly whitelist the google ips.

that's against TOS for Google SERPs

Showing different results to google than you do to users is called cloaking and it's not allowed


Definitely, but someone at LinkedIn could get in touch with their google contact and make it all work and follow the rules.

https://support.google.com/news/publisher/answer/40543?hl=en

Apparently, we had something similar at Demand Media before the panda update.


How does one build such google contacts if you are a seed stage startup? I understand it wouldn't be the same league as LinkedIn Reid Hoffman level contacts, but even some kind of contact at Google search?

I wish I knew. Easiest way is to make a ton of money off google ads like we did. Even then it was hard to actually talk to someone - especially at YouTube. And they had an office a few floors up...

Neither of the Craigslist cases reached an appellate level. They were only district court decisions, so as I understand it, they only have persuasive value when applied to other cases. The judge in this case mentioned Craigslist v. 3Taps, and apparently was not persuaded by it.

Without having downloaded the order from pacer, if i had to guess, i don't think a click wrap would change anything.

The free speech argument is certainly a dud here. The argument that has likely had any weight at all would be the antitrust/unfair competition one.

A click wrap will not change that.

It's almost certainly about linkedin's repeated claims about how they don't own these, they are public info, and they want to make them public, and now is turning around and saying "just kidding!", and trying to put someone out of business who depended on that, all so they can start their own analytics product.


Is the reasoning in this case different from the Craigslist/3taps dispute?

Does anyone know where to view this ruling?

I'm curious how it passes free-association muster: you're not allowed to discriminate on particular tasks, but there's no reason you can't discriminate based on eg, behavior or user-agent or IP address.

It seems very strange to me that the judge would order MS to associate against their will prior to hearing the arguments.


It's a preliminary injunction, not a ruling. For all we know, ruling could be completely different.


Wow, that sign analogy is really faulty.

After skimming the document for a bit, hiQ's argument looks really flaky. Especially grasping at straws like "Free Speech". They argue that LinkedIn is like a public mall and denying them access to the mall is denying them "Free Speech"? I don't see how this can be the case if they had no intent to "speak" at all in this place. Their data collection via scraping seems more like people-watching in the mall, if you go along with their analogy.

Indeed, and the court rejected that part of HiQ's argument.

"In light of the potentially sweeping implications discussed above and the lack of any more direct authority, the Court cannot conclude that hiQ has at this juncture raised 'serious questions' that LinkedIn's conduct violates its constitutional rights under the California Constitution."


This is just a preliminary injunction and the court has not even heard or ruled on this case. They just allowed HiQ to access the data while they wait for the scheduled court hearing to begin. The court may eventually rule very differently once they have heard all the evidence presented and weighed up existing applicable case law.

The judge who issued this injunction - Edward Chen, is also the judge presiding over the Uber drivers as independent contractors class action case.


Wow, those are two very big issues, that affect two very large industries, not to mention the implications and precedent for both free speech and workers' rights.

There are not that many judges on the Northern District of California

https://www.cand.uscourts.gov/judges

and cases involving Silicon Valley companies are very often filed here, so quite a lot of the high-profile industry matters end up getting heard by the same judges!


"This is just a preliminary injunction and the court has not even heard or ruled on this case"

This is not quite right. One of the requirements to get a PI is a likelihood of success on the merits ;)


The order discusses how less is required of the merits (need only raise serious questions) if the consequences are dire (going out of business), then discusses how they depend entirely on LinkedIn.

It's also possible that the judge is giving them affordance before killing their business to prevent appeals.


FWIW: That is actually fairly rare. Usually the answer is "well, the creditors can continue the lawsuit if they think it is valuable"

I agree that we need to be careful not to read too much into this, but in most scraping cases I know about, preliminary injunctions are granted as a matter of routine.

The fact that this judge refrained from doing so may signal that the judiciary is finally willing to bring some nuance and rationality to their interpretation of extremely broad statutes like the CFAA. It's a positive signal, even if ultimate victory remains unlikely.

/me is not a lawyer


Forcing LinkedIn to provide the data and remove any locks is pretty harsh.

Imagine I sued a bank under the theory that the locks on their vault doors are illegally preventing me from opening them. As part of the pleliminary injunction, the judge rules that the bank must remove all locks until the case is decided.

Point is, it's not just "maintain the status quo", it's "give them your data for free". IANAL but I don't think preliminary injunctions should change the status quo by intruding on what is plausibly someone's private property.


Misleading analogy. Physical goods such as a bank's money are both not reproducible and not normally available for the general public to just take

"U.S. District Judge Edward Chen in San Francisco granted a preliminary injunction request brought by hiQ Labs, and ordered LinkedIn to remove within 24 hours any technology preventing hiQ from accessing public profiles."

Interesting ruling


I would hope there is a special consideration for any anti-ddos technology they have. It would be hard to differentiate between a ddos'er and a scraper. Rate limiting for ddos attacks might affect a scraper, then the question ( that linkedin is asking ), is how low can we limit them without looking like we're blocking them. I have a feeling this isn't over!

I wonder if this isn't such a big deal since it's not like they're gonna verify beyond "can they scrape now?"

As long as that is true then they will likely not run in to issues. Other issues are not for blocking them and case can be made that it's a separate issue. Defending against common internet attacks is an easy case to make to a Judge. He can't be expect LinkedIn, in this case, to kill their service so someone can scrape.


Being a programmer not a lawyer, I like the idea of more rights for scrapers. I don't want to see the internet partitioned away and owned by a few companies, especially when that information is often called a "public profile".

Can they claim a tax credit for supporting that bandwidth usage and handling abuse?

Not needed. Cost of doing business.

I don't trust the US government to write good rights for scrapers. They can't even do computer crime sentences well.

At best, it's a burden for no solid gain for society. At worst, there will be loopholes used to DoS businesses because they can't shut down individuals due to law-given rights, and that will lead to court fights.

These rights would do nothing but save scraper authors from learning to obfuscate their actions.


Information wants to be free. People should stop fighting it!

If one makes information "public" but don't really want to share it, then the public is fully justified in taking it.


<< Information wants to be free. People should stop fighting it!

That's why you routinely publish your bank account, social insurance, and credit card details online, right?


Not like I have a choice... and how are those things explicitly declared 'public'?

It gets into the incredibly murky water of how the web works. You're just issuing a request and getting things back. Sometimes in a web browser, sometimes not. But the content itself may still be copyright. You can't just take it, even though for now, the publisher/server is allowing you to view it for free.

But what if you only chose to view some of the content (e.g. block ads). What if you apply your own styles to change the way that information is displayed? You're just changing the way the browser represents that data. You're not redistributing it as your own at this point. What if you store that data, but don't republish it; just used it in some each algorithms?

There are a whole lot of interesting grey areas here, but many that already have precedents that side more with the copyright holders.


Maybe the whole idea of copyright is flawed and harmful?

There's nothing wrong with copyrights; 14 year copyrights.

Why 14, though?

I'm well aware of the historical precedent, but that number was rather arbitrary even then - it was what a bunch of people agreed upon, based on their ideas and experience, and given the environment. It's doubly arbitrary today, considering how much the environment has changed. Is a 14-year copyright on software reasonable, for example, or too long.

Rather than making it a hard cut-off point, it would be interesting to come up with a scheme that attempts to capture the spirit of term limits.

Consider: why are copyright terms even a thing? Well, copyright is a monopoly on a thing that is not naturally restricted; it does not exist in the absence of society, and is therefore a privilege granted by that society. By itself, copyright is meant to encourage creativity in the interest of public good, and at the same time, to provide some means to derive profit from one's creative expression. So there are two conflicting interests at play here - the desire of the creator to be rewarded for the fruits of his labor, and the desire of the society to enjoy growing, constantly enriched culture. The copyright term, then, marks the point at which the latter trumps the former.

Instead, what we could do is capture the fact that the interests conflict. For as long as you hold copyright, you're effectively denying society the ability to freely enjoy the culture that you have enriched. Why, then, not tax the copyright accordingly? You could consider it a kind of intellectual property tax, but with a twist: the longer copyright is held, the more the interests of society are infringed, and the larger the compensatory payment required to maintain the copyright.

So we could start with a grace period of a couple of years that is completely free, then it starts growing steadily. For some really popular work that makes significant profits, the author could easily afford payments to maintain copyright for a decade or two (or however long - that is something that can be dialed arbitrarily). For things that are too obscure, payments would cease shortly, and they would fall to public domain. There wouldn't be such a thing as "abandonware" anymore.

What use to put the money to? Many possibilities there. Publicly sponsored arts and art education is an obvious choice. Another interesting example would be offering bulk sums of money to authors of culturally important works to surrender their copyrights sooner, so that the public can enjoy them.


I'd love to see more debate about just this.

There is no grey area. You can not copyright facts. If you download ("scrape") a webpage and then extract the facts, whatever you downloaded only exists in volatile memory. So there is no claim there. The only claim you can make is on the download itself, hence what LinkedIn chose.

I've seen plenty of LinkedIn profiles that could be classified as fiction.

But LinkedIn isn't the owner of that fiction.

It's possible that HiQ is downloading and processing things that aren't facts. Long passages of text in LinkedIn posts, recommendations, and comments aren't "facts".

Though that doesn't appear to be the path LinkedIn is using to fight it.


Copyright isn't thoughtcrime, if they aren't redistributing it to anyone with standing to sue (very likely not LinkedIn, no matter their ToS) they can process things all day.

I did not say or imply "thoughtcrime". Just noting it can be more complex. Sentiment analysis of copyrighted text passages might be claimed as being a derivitive work for example. Fair use does have limits.

The standing to sue is an issue for user generated content, yes.


>Sentiment analysis of copyrighted text passages might be claimed as being a derivitive work for example.

So if a company releases a product that predicts likelihood of an employee quitting, you think you're going to have standing to sue because an analysis of a copyrighted passage you wrote comprised 0.000001% of the source material the algorithm was trained on?


You left out my last sentence, on purpose I suppose.

The context was that scraping doesn't always get a free pass because "facts". This specific case may skirt it because of the user generated content. Doesn't mean it's not worth mentioning for the larger context that copyright isn't black and white.


The interesting part here is that linkedin doesn't hold any copyright on much of the data. You cannot copyright someones name and title.

But you can copyright a database.

that one is iffy, http://www.nolo.com/legal-encyclopedia/types-databases-eligi...

I'd suspect LinkedIn could argue about the network they create is a creative work and would be covered, but the facts about each person might not be copyright-able.


whats inside the database is not owned by them.

I think most of those "grey areas" already have plenty of precedent in the world of art & music. Have you heard of "fair use", and what it allows and does not?

> It gets into the incredibly murky water of how the web works.

There's no "murky water" in how the web works. It's very clear and precise, and anybody can learn how it works. It has to be precise and well defined, because computers can't operate any other way.

If Linkedin doesn't want "public" profile data to be accessible to everybody then they need to stop calling it public and put it behind some kind of access control.


Computers are also operating with absolute clarity and precision when somebody exploits a flaw in their software to execute arbitrary code. That's why the CFAA has its language around 'exceeding authorized access' (even though it turns out that definition is dangerously vague).

But that's exactly what LI did, then came a judge and ordered them (temporarily) to remove said access controls.

If a website puts something on the public internet, it should not even be aware if it is being accessed by a scraper or a human.

Maybe we should just ban User Agent strings and be done with it.


I'd be willing to bet that the user-agent field isn't the problem; it's patterns that everything looks for now, right? People have been lying in the request headers for decades.

Yeah, I remember a plugin in the early days of Firefox that allowed Firefox pretend to be Internet Explorer. It would send request headers that made it appear to be IE.

There were websites at the time that would display just fine in Firefox, but would refuse to display anything if they detected a non-IE browser.


You call it the "public internet" but it's most definitely not a public space or anything like it.

Private entities own and operate all(most of) the servers, services and conduits, and that does need to be paid for and maintained.

I'm not saying I agree with Linkedin in this particular scenario, but this is about two commercial for-profit entities arguing over money, so let's not make it about something it's not.


> Private entities own and operate all(most of) the servers, services and conduits, and that does need to be paid for and maintained

And are MORE than happy to send the content of their servers to unsolicited, uninvited, anonymous guests on mere request. No-one is forcing them to do so!


And that's exactly LinkedIn's position here.

No one should be forcing them to send their content to anyone.

They claim they should be allowed to discriminate at their discretion who they respond to, since they own and operate the servers.

This "no one is forcing you to send your content" goes both ways..

If one side is going to say they're entitled to receive the content on request, the other side wants to be able to say they're entitled to refuse to answer that request..


>"You call it the "public internet" but it's most definitely not a public space or anything like it."

How is it not a "pubic space"? They publish publicly visible A records for their site as well as route their public IP space to transit providers in order for the public to be able to reach their site.


If I publicly distribute posters about a party in my apartment and leave the door unlocked, do I still have the right to kick out anyone I don't like?

That's a pretty diengenuous strawman.

A more accurate comparison would be that you put up an advertisement on a billboard along a busy street and then decided to tell people who passed by that they weren't allowed to take a picture of it.

And to continue with this absurdity you feel entitled to enforce who can or can not look at your billboard because despite it being publicly viewable its your advertisement on the billboard.


That's also not accurate.

There is no "public space" on the Internet.. There's no un-owned territory or resource that is free to use or metaphorically "stand around" in to take those pictures from.

You are consuming privately-owned resources in all your online activities, and as such some will argue that they can decide to limit your consumption of those resources at their own discretion.

Again, I am not siding with either party here, just trying to dispel this notion that "public space" - in the way we understand public space to exist in the physical world - exists on the internet.

In your example, no one is controlling your right to take photos or stand around and look in any direction you choose.

When you use the Internet, a private entity is allowing you to transit through their network and access sites, a different entity is allowing you to access and receive their content, etc..

LinkedIn owns the server you are accessing when you (or others) go to their site, and they are spending resources servicing those requests, and - they claim - can decide how and when they choose to do that..


It really doesn't take much effort to detect the majority of scrapers. Usually you do so by monitoring patterns of any given IP.

Is each request a profile page incremented (/users/1, /users/2, etc)

or dozens of requests a minute (faster than a typical user would read)?

Is static content (particularly images and CSS) being downloaded too or just the HTML content?

Sometimes the referrer HTTP header can give clues too - though you have to be careful there as that's as unreliable as the user agent header.

However if you're really paranoid about scrapers you can also throw in some honeypots. eg a fake user (/users/13) which is a user account that doesn't exist so that page wouldn't have any links from within your site. ie you only reach it if you're incrementing through the user IDs. Or perhaps a link within your HTML which doesn't render so it's only reachable via automated scripts that don't check what links are rendered inside the display view. Anyone that gets ensnared in your honeypot could then be put on a temporary IP blacklist. Though the danger of doing this is you accidentally blacklist good crawlers if you're not careful about setting appropriate robots rules.


I've kind of always thought that we shouldn't be using UA strings. Just give the requester the data that they requested according to the current open standards. If they choose not to render it correctly, then thats their problem.

Yes, I realize that it's not that simple, but I think browsers would have tried much harder to adhere to standards if we had done it that way.


> Being a programmer not a lawyer, I like the idea of more rights for scrapers.

What rights should scrapers have that they don't right now? Keep in mind that a lot of the scraping going on is just some other private company abusing access and hoping to gather and use the information for their own private profit. How many companies are scraping StackOverflow for example and doing nothing but attempting to copy it and draw traffic to their own site? I can't stand copycat sites, they fill my search results with junk. I would assume the majority of scraping that is currently happening is not doing the public any good.

> I don't want to see the internet partitioned away and owned by a few companies, especially when that information is often called a "public profile".

This sounds like you're suggesting that LinkedIn or Facebook calling your profile a 'public profile' means that the law should treat it as a public service due to use of the word 'public', is that what you mean? The word public may be overloaded here. I can see why tax funded projects should be publicly accessible, but I have a hard time seeing why private companies should be compelled to provide access to anything at their own expense.


Consider that should anything ever happen to the sites they scrape, suddenly they become super valuable to the rest of us. Decentralized information is not a bad thing. And arguably, if monetizing those scraped sites pay for them duplicating the data to additional places, I think that's probably fine.

Arguably, if they do a better job getting that information in results to people who need it in search, they may be performing a service there as well. (A lot of decently informative sites have absolutely awful search/visibility.)


> Consider that should anything ever happen to the sites they scrape, suddenly they become super valuable to the rest of us.

That may well be true, but that value doesn't mean anyone should just be able to take that value from the company that put up the effort and investment to collect the data, and turn around an use it for their own profit. Nor does it mean that a company shouldn't be able to serve the data to whomever it wants and/or restrict access from whomever it wants. Value to the consumer is still not a reason to compel private companies to offer public services. It would be valuable to both of us if Google gave us free money, but no court is going to compel them to do so just because of the potential value to you and me.

It seems bad, btw, if we choose to rely on private companies to keep the only backups of our personal data. If a site going down has a negative effect on my life, and takes down data with it that I need, it might be an indication that I shouldn't have kept my data there.

Also true that decentralized information is not a bad thing, as a generic ideal or a data backup plan. But for a business, decentralization in this context means loss of profit, as well as possibly theft, cheating, and copyright violation.

Also, the increased value that comes from a company folding will be used against you by these private, for profit scrapers. They can and will hold their copy ransom for more money, if possible.

> Arguably, if they do a better job getting that information in results to people who need it in search, they may be performing a service there as well. (A lot of decently informative sites have absolutely awful search/visibility.)

What is the argument in favor of this being legal? It is currently not legal, and the law currently does not give any credit for 'doing it better'. Why should it?


It seems like a really dangerous precedent to interpret "unauthorized access" in CFAA to mean accessing a computer that is made available publicly without any form of access control when the owner doesn't want you to access it instead of meaning that you subverted some form of access control to access the computer.

I'd argue if the data is being made publicly available (particularly in the case of LinkedIn where we are choosing to make our own data publicly available), it is being made publicly available, and anyone should be able to take and use that data. Our data shouldn't be what a corporation's profit (or loss of profit) is based on to begin with.

> if the data is being made publicly available ... anyone should be able to take and use that data

It isn't being made publicly available in that sense. LinkedIn only offers the data to site visitors (unregistered users) under the guise of a license.

> we are choosing to make our own data publicly available

This isn't true. Putting data on LinkedIn is not making it publicly available, it's sharing a copy of your data with LinkedIn, and allowing them to do whatever they want with it. Those are the terms you agree to when you register.

> Our data shouldn't be what a corporation's profit (or loss of profit) is based on to begin with.

I agree, in an ideal world, but LinkedIn does profit on your data (as do Facebook, Google, Microsoft, etc.). And we are willingly sharing our data with them and allowing this to happen. There are all kinds of crappy trends with data and privacy happening, and lots of people raising red flags. Your choice is to not use those services. If you don't want LinkedIn to use your data for their profit, then don't share your data with LinkedIn. If you share your data with LinkedIn, then LinkedIn now has the right to use your data to their own advantage.


This case is much more significant than that. The ruling will determine whether or not breaching the terms and agreements of a website constitutes a crime under the Computer Fraud and Abuse Act (CFAA). LinkedIn is arguing that since scraping is not defined by the company itself to be a permitted use, a federal felony is being committed by anyone using a script to access its website. The language used in the CFAA is "unauthorized access" and this case will set a legal precedent for the meaning of that term.

So some company can sue me if I use `wget somecompany.com` if LinkedIn win?

We can hope that this case will set a legal precedent, but it may, like the Craigslist case, end in a settlement (or some other disposition) before that point is reached.

The way I see it, if you want to make a 'public' profile, host it yourself, create your own website. LinkedIn has spent a lot of resources to create a service/network that helps to connect you with recruiters (or the opposite), whose information may not be easily available to you or can be found easily in the 'public' Internet. If you think from a product owner's standpoint, why should I let just anyone scrape the content collection that I built with so much effort?

Agree. However, I think there is a difference between harvesting a database's contents and observing trends, data and creating algorithms based around that observed data.

RTFA before commenting, folks. Questions and misunderstandings in this thread that are easily fixed if you just RTFA.

Does anyone know how they do this scraping from a technical standpoint. The articles allude to it being the same as data Google/Bing spiders, which can clearly access more data that average internet IP for making their result summaries. I had assumed big sites whitelisted specific crawler IP ranges or User-Agents for the search giants. Do they somehow spoof this?

I don't think they do any such thing, if anything they are rotating IPs/user agents to avoid being limited or blocked.

Google requires sites to send the crawler the same content as someone clicking a link on a Google results page would see, so even if some sites get creative covering it up with blurred boxes and similar dark patterns, the data is there in the markup.


I haven't checked the markup, but if you try and hit a linkedin profile page you just get forwarded to a login page. Perhaps if you don't follow the forwarding?

Not sure how this complies with google's requirements, I suspect if you're big enough you get a custom arrangement. However, that doesn't explain how hiQ are getting the data.


Fixed the headline: U.S. judge says LinkedIn cannot block startup from public profile data; the judge will personally pay for the gazillion servers and man hours needed now that scrappers cannot be blocked.

> “We will continue to fight to protect our members’ ability to control the information they make available on LinkedIn.”

LinkedIn has full control over this, it's their site. What they are fighting for is the ability to choose who gets public access to various pieces of information; which its member do not get control over.


This is still very confusing, why didn't they completely block it from scraping without a login?

Because they won't get the traffic from search engines. Searching for a full name very often leads to a LinkedIn page in top 10 results.

Does this ruling include regular anti-scraping defenses that might stop HiQ, but doesn't specifically target them?

So they want to forbid a startup to scrap the personal data of their users as if Linkedin was the only company allowed to have access to this data.

I mean it is completely crazy, it is not LinkedIn data it is OUR data


Are you paying the hosting costs for LinkedIn's site and database? Their users gave LinkedIn the data, it's now theirs.

That's the equivalent of sending your resume to a company for a job opening and the company saying, "since this resume is now on our control, we own the information in it."

I fully support this decision. If you're offering a service that is public, with the intent to your users that such information will be available publicly, you cannot then police what users of that data you consider to be "public" because it serves your business interest.

LinkedIn, of course, wants to get all the benefit of the public Internet with providing as little as they can. This, coming from someone who used to work at LinkedIn.

These companies have built their fortunes on the public Internet and now that they are successful they seek to not pay homage to the platform that give them their success. It's very clearly anti-competitive, and bad for users. LinkedIn should be forced to compete based upon the veracity and differentiation of their service, not because they have their users' public data held hostage from competitors.


This is a tricky issue that has more to do with user psychology than technology. While the data is public, most users do not understand the persistence characteristics of data, especially in the presence of 3rd parties.

In a world where there are no (persistent) copies made by third-parties, the user still is in control of the visibility of their data by updating their profile directly on LinkedIn to show/hide pieces as they see fit. With a 3rd-party in the picture, updates to user-data may or may not be respected by the 3rd party, leading to poor user-experience.

Quoting the article, "HiQ Labs uses the LinkedIn data to build algorithms capable of predicting employee behaviors, such as when they might quit."

Based on that one statement alone, as an employee, I would be uncomfortable with the use of my data to supply my employer with my future plans before I choose to disclose it myself. That choice is mine, and mine alone; not something to be monetized simply because the option exists. And while I have no control over the sharing of data, should something like this happen to me, I'd be more inclined to stop using LinkedIn, which in-turn affects LinkedIn's ability to do business.


Then don't make your profile public.

You can't expect someone to forget you once had a bad haircut just because you now got a really cool one.


actually, that happens a lot, at least before the advent of the internet. Physical photos are not always available, and people's memories fade. This is why a lot of people still think of information in the old way - as time goes by, it might fade.

The problem is you're basically saying to re-train every person who uses the internet to behave in a way completely different from what they're used to. If you think you can do that, by all means try.

But don't be surprised when you find out that your expectations and their expectations were different, and you're the one they blame, and they outnumber you by a lot.


> The problem is you're basically saying to re-train every person who uses the internet to behave in a way completely different from what they're used to.

When did we ever teach people that you can control where your data ends up on the internet?

Aren't we trending towards teaching people to not even share data on "closed" services like Facebook and Gmail precisely because they are a single source for a lot of data to be misused by the company, or hacked by a malicious actor?

Regarding data that is accessible without "friending" someone or logging in as the user themselves (e.g. Gmail), I hope people already realize that this data can easily be re-used.

> But don't be surprised when you find out that your expectations and their expectations were different, and you're the one they blame, and they outnumber you by a lot.

If the majority of people think that they can share nude photos of themselves on their own blog or twitter and that this won't be re-used elsewhere, well... I must be living in a different reality, or I misunderstand your point.


We taught people that they could control where their data ends up before the internet, because that was an accurate model of reality. I think this is a problem that we'll grow out of as more and more people grow up in the shadow of omnipresent social media.

Junk mail, telemarketing, revenge porn, and gossip predate the internet. Different media, same lessons.

indeed a way completely different than anyone in the history of humanity has ever behaved.

While that is true, you can't be dismissive that technological advances allow for more sophisticated "memories"

This argument would hold more water if these services didn't constantly rely on dark patterns to trick people into making things public that they otherwise would have preferred to be private.

One solution to both HIq labs and linked in is to give users, not these companies, some kind of ownership over their data.

Instead of having information about you be owned by which ever corporation collects it, have it at all times be owned by you.

While there are some clear problems with this approach, something needs to be done about companies building databases of ruin where every moment everyone lives from the day they are born until the day they die are cataloged into database for review either by society in general or by algorithms looking to make predictions that impact an individuals future.

I should have some level of control over my Personal information, today even if you actively suppress the 1st party info you put out in the world the number of 3rd parties adding to your profile dwarfs any data you personally put out there, from credit reports, to government databases to credit card companies to soon your web browsing history sold by the ISP's

It is ,IMO, out of control


Data you own is not public, by definition. This needs to be made abundantly clear. Something like "all rights reserved" on images.

As usual this is the "copyright" debate.

So is your public profile copyrighted by you?

The actual server is controlled by you or gives you a way to take the data down. But by then it could have been republished elsewhere!


You do "own" the data. It just so happens that by registering to LI, you've granted A worldwide, transferable and sublicensable right to use, copy, modify, distribute, publish, and process, information and content that you provide through our Services, without any further consent, notice and/or compensation to you or others

To whom have you granted the license to? I could see a legal argument that you've granted LinkedIn such a license, along with the right for LinkedIn to sublicense and transfer it, but still retain the right to press claims against third parties that scrape LinkedIn and use the information.

I don't think you've granted LinkedIn the right to sue to enforce your claims against third parties, so you'd have to sue directly.


The EU data protection law is sorta like this in theory. Users can't give up some rights.

>Based on that one statement alone, as an employee, I would be uncomfortable with the use of my data to supply my employer with my future plans before I choose to disclose it myself. That choice is mine, and mine alone; not something to be monetized simply because the option exists.

What choice? The only choice you have is whether to post public info or not. You have no choice over what others do with it. You can't police what others do with information that you freely publicize.


The issue isn't that the data is public. The issue is the extraction of underlying behavior based on the spatial and temporal characteristics of the data being posted. E.g. the fact that I updated my profile on LinkedIn is not in question. But the underlying behavior is that the average user normally updates their profile around the time that they start a job-search. While this isn't true for every user, the use of an algorithm may turn up false positives, which may have unintended results, especially when this data is presented to your manager.

Secondly, while the updated profile is public knowledge, the temporal characteristics of the update isn't a feature that is directly published by LinkedIn. Call it a product feature; it is tailored to present the qualifications of a user, not to advertise the fact that they may be looking for a job. While one can argue that update is public knowledge, and must therefore be available for data-mining purposes, there is a subtle, but potentially dangerous leak of information here that is open to interpretation.

LinkedIn's position, that this leak may be potentially harmful to its users, and by extension, its core business, is therefore a fair point.


Though I disagree, I think basically understand your argument regarding why people should have to respect the wishes of a party when dealing with information that the party made public.

What I cannot understand is how such a system could reasonably be enforced. Let's say John Doe posts his resume on a job board. If I print out his resume, but he later updates it, am I now somehow in the wrong for retaining the old copy?

I am also a little puzzled by the notion that "persistence" is a new phenomenon. Of course there have been paper records and such for quite some time, but I'll put that aside for a moment. When I was younger, I was often cautioned to think carefully before acting, as a reputation decades in the making could be permanently ruined in just minutes. It seems to me that when it comes to mistakes and "bad" deeds, society's collective memory has always been rock solid.

Rather the persistence, I think the new factor is that things are less regional than before. You can't just pick up and move to a new town, because they basically have the same Internet everywhere.


>What I cannot understand is how such a system could reasonably be enforced.

That would be a 26 billion dollar question, and one I would very much love to solve one day! :)

I believe that your example is fairly simplistic to capture the crux of the issue here. The metadata associated with posts often contain features that are quite revealing, but not necessarily the kind of data that the source would wish to be revealed. E.g. I have noticed multiple times that the number of cold emails I have received from recruiters is higher immediately after I update my LinkedIn profile, leading me to believe that the last-update timestamp is a feature that the LinkedIn search engine may be relying on to rank results. While my evidence is purely empirical, it isn't a stretch to imagine that it would be a reasonable thing to do, given that most users normally update their profiles when they are about to start searching for new opportunities.

On the one hand, this is a an excellent product that allows recruiters to reach targets that exhibit behavior associated with active job-seekers, resulting in better connections. It results in a win-win situation where the recruiter gets a pretty good return on his/her investment, and the target receives the attention/information they were looking for with their update. False positives in this situation result in a few unsolicited emails/unwanted attention.

On the flip side, this information can be repackaged to present a manager with a graph plotting the probability of an employee quitting. While this is a perfectly good product, no employee would every use a service that might reveal their future plans ahead of a time of their own choosing. Furthermore, false positives here can have a significant impact and LinkedIn may not remain in business for long if word of this product gets out.


With regard to persistence, I don't think you can argue that the scale of what can now practically be stored hasn't changed significantly.

Given that a larger amount of information can now be stored easily, the economics of what is stored is different, and one would expect this would lead to the storage of more information over time. If you can store in a single HD what would have previously taken a large room, for a tiny fraction of the cost, the bar for storing the information vs. discarding it is much much lower.

I also think it's not only storage, but searching of large datasets that's another reason that information changing over time. Again, if finding information in a large blob of data is easier and as discussed above, is much cheaper, then this will also lead to people storing information for later retrieval. As you say, the fact that all this storage is not networked together means that this information is now easily retrievable from anywhere, only intensifying the impact.


I have no love for linkedin, but not sure of your position.

They collected the data, host it, etc, and incur costs for doing so. Just because they allow the public to access it, doesn't mean the public should have a right to re-use it.

People argue that the data is public. I say that's not the issue. While the data itself might be available elsewhere, it is raiding the _collection_ of it that is being argued, not that 'public' data is 'private'.

The _value_ that LinkedIn adds is that they've built the structure to collect and maintain the data. They are _not_ asking the court to prohibit anyone from collecting the same data on their own, at their own expense. If someone wants to start a rival LinkedIn, they are free to do so.


> Just because they allow the public to access it, doesn't mean the public should have a right to re-use it.

That is exactly what public means. Do not make it public if you don't want 'the public' to use it.


Let me replace the word "re-use" with "re-publish". Does your analysis change at all?

No.

Public means everybody can do whatever they want with it, no exceptions (except, as with all things, by law). If you want to restrict the information, then do it, but don't make it public and then when a competitor uses it claim it wasn't public 'for them'.


Linked in never made it "public". Use of their site is and always has been licensed. https://www.linkedin.com/legal/user-agreement

The user agreement does not cover the "public" parts of Linked In, like for example, the user agreement. If I want to copy and republish the user agreement I can, despite of what it might say.

If the startup were republishing private information from Linked In I would agree with you.


> The user agreement does not cover the "public" parts of Linked In,

I beg to differ. Their EULA covers "accessing or using" their site in any way shape or form, and defines the term "visitor" for what you're calling "public".

...

You agree that by clicking “Join Now”, “Join LinkedIn”, “Sign Up” or similar, registering, accessing or using our services (described below), you are agreeing to enter into a legally binding contract with LinkedIn (even if you are using our Services on behalf of a company). If you do not agree to this contract (“Contract” or “User Agreement”), do not click “Join Now” (or similar) and do not access or otherwise use any of our Services.

...

When you register and join the LinkedIn Service, you become a Member. If you have chosen not to register for our Services, you may access certain features as a visitor.


I do not agree with the EULA. I definitely do not agree with the EULA just by reading it. I most definitely do not agree with the EULA just by virtue of it existing and being linked to on some corner of their site. I do not agree to any terms just by visiting a webpage. I am not bound by anything other than the actual law and the contracts I have willingly entered into in writing or the digital equivalent.

If the information is restricted, then restrict it. Do not make it publicly available then claim a webpage as the ruling contract of that information when it is used in a manner you do not agree with.


According to their view, you do agree to the contract by using their services, which includes visiting their web pages. If you don't agree, then don't use the services and don't visit their site. Or do, and argue it in court, but it's pointless to tell me you don't agree, the contract exists.

They're not restricting access to the information. HiQ is scraping their site using bots, and LinkedIn doesn't like it. This isn't a debate about anything being publicly available or not, this is a business fight between two private companies.


Well then, since what you are saying is that a contract only one part agreed to is a valid contract,

By reading (or not) this comment, you (“the reader") concede all the points of this discussion. The reader also agrees that the arguments presented by HackerNews user “redial” ("me", "we", "us") are correct even in case of conflict with his or her own previously stated positions, and that he/she will amend all of his/her previous comments to reflect this legally binding agreement.


+1 for the lols. You seem to be arguing against me for the existence of EULAs. Maybe you're not aware that these have been around and their validity has been debated for decades? Lots of people are super bugged by them, just like you are. I think's it's fairly lame too. I didn't write the EULA, and I don't care if it's a valid contract. But no matter what you say to me, no matter how much sarcasm you use, the fact is that LinkedIn's EULA says that by visiting their site, you are agreeing to their contract.

The real point I was making is that LinkedIn is establishing that they are not offering a public service. It doesn't matter whether you can be bound by their contract, the EULA is more about covering their own asses when they do things like refuse service to HiQ. The wrote the rules so that it's clear what things you can do to get banned. Regardless, they have the right to ban IPs or specific bots or whoever they want, because even though they let anyone access the site, that doesn't mean they have to let everyone access the site always. Like it or not, them's the facts.


the fact is that LinkedIn's EULA says that by visiting their site, you are agreeing to their contract

So what? The whole point here is that they can say and think whatever they want, but it doesn't make any difference if the law disagrees.


> According to their view,

which the judge didn't agree with. So right now your argument is counterfactual and pointless.


The judge did not rule on the validity of their EULA. It was an injunction.

you dont need.to sign.on to view public profiles so you dOnt enter any agreement with linkedin as a visitor.

You can choose to see it that way if you want. LinkedIn's EULA says otherwise. I have no opinion on whether LinkedIn's EULA is enforceable or legal, I'm only sharing the facts, and the facts are that according to LinkedIn's EULA, visitors do fall under the agreement.

You seem really hung up on what their EULA says, and I'm not clear on why. The question at hand is whether their EULA applies, so whatever is in it is 100% irrelevant to that question, right?

Is there any precedent for an EULA like that being enforced? Typically for a contract to be valid, acceptance has to be actively communicated. You can't be bound by a contract simply by someone saying that you have accepted it if you do something that you might have done normally.

> Is there any precedent for an EULA like that being enforced?

I don't know, I'm not a lawyer, but Wikipedia says "sometimes".

https://en.m.wikipedia.org/wiki/End-user_license_agreement#E...

> Typically for a contract to be valid, acceptance has to be actively communicated. You can't be bound by a contract simply by someone saying that you have accepted

Again, not a lawyer, but I imagine that use of a service could legally constitute your active end of the communication. You're right, you can't be bound just because someone says, but when you use a service you've gone one step past.

Honestly, I think the EULA is more of a CYA for them than a contract, in practice. But it does establish the potential legality for two things: 1- that this is a licensed service, and 2- that they can refuse service to anyone they want for reasons of business interest.


I just had a conversation with my own company's lawyers about this recently. Their assessment was that the spectrum from something like what LinkedIn is doing to something like a notarized paper document with a wet signature is a trade-off between ease/simplicity and enforceability.

These kinds of agreement ostensibly are enforceable, but harder to enforce.


For a public license to be valid, you need to be able to view the terms, to agree to them.

But the EULA says that simply by accessing the site, you agree to its terms.

To view the EULA, you must view the site.

That's just one of many problems wrong with assuming the EULA is binding.

The ruling in this case, has said it isn't, instead because some people can index and scrape (search engines), but others (startups for example) can't. Which is an anti-trust issue.


> The ruling in this case, has said it isn't

No it didn't. This injunction has temporarily prevented LinkedIn from blocking HiQ, and only HiQ, while the case is argued. The court might rule that LinkedIn can't block anyone, or they might rule that HiQ is not entitled to scrape LinkedIn's data.

> Which is an anti-trust issue.

HiQ claimed it's anti-trust using inflammatory language in their PR statement. I disagree with that assessment. LinkedIn is not preventing HiQ from collecting their own copy of the data, in any way, shape or form. HiQ is claiming they should be able to take LinkedIn's copy because the data is "public" data. Even if that's true, HiQ always has the option to get the data from the same source that LinkedIn did.

> To view the EULA, you must view the site. That's just one of the many problems wrong with assuming the EULA is binding.

Absolutely right, EULAs have all kinds of issues. In practice, the issue of having to access the site to view the license isn't a problem. You can choose after reading the EULA to not agree, and you can choose to not access any other data on LinkedIn.

But there is no reason to assume the EULA is not binding because there are no other legal documents that cover your interaction with LinkedIn, aside from any state and federal laws that might override parts of the EULA.

This is mostly irrelevant to the point I was making though, it doesn't matter if the EULA is binding. It's purpose there is to establish that LinkedIn is not providing a public service. It's communicating that there is no expectation of responsibility on the part of LinkedIn, and that doesn't really depend on whether you are specifically bound by the EULA.

It's just like a sign in a store window that says "we reserve the right to refuse service to anyone, at any time, for any reason." You can could say that the sign is not a binding contract, and go into the store naked and yelling and start breaking stuff. When they kick you out, nobody will come to the defense of your right to walk into a store that everyone else is allowed to walk into.


>> The ruling in this case, has said it isn't

> No it didn't.

True. I should have said, "the ruling in this case, has said it isn't clear if the agreement should be binding".

> HiQ is claiming they should be able to take LinkedIn's copy because the data is "public" data.

Nobody is taking anybody's data. LinkedIn are providing copies of the data to anybody who views the page. You can't take something from somebody else in this context. It is not possible. Copying, and ineffective deleting are the only methods available for transfer.

> In practice, the issue of having to access the site to view the license isn't a problem. You can choose after reading the EULA to not agree, and you can choose to not access any other data on LinkedIn.

It is absolutely a problem. You don't view data. You download a copy.

You are not presented with the agreement upon visiting a public page, you first download the public page, which then links to the agreement.

Thus, when the agreement becomes enforced, you already have in your possession data from before you agreed, which is then governed by rules you were not aware of, and may not become aware of as the agreement doesn't require intervention.

If we have to come up with physical analogies for a problem that is inherently digital:

You walk into a store. The store hands you a CD, that they made just for you, saying its yours.

You then say thankyou, and only then does the store say that there are conditions attached. But you can't give the CD back. You can only agree that you will destroy it at an indeterminate time in the future. And your method of destruction is almost guaranteed to be reversible, but its all you have.

Oh, and you might not have chosen to even walk into the store. You were stumbling around other stores, and a door led you here.

In common law, once possession is established, new conditions on the possessed item are next to impossible to apply, unless the method of possession was itself a crime.

---

> But there is no reason to assume the EULA is not binding because there are no other legal documents that cover your interaction with LinkedIn, aside from any state and federal laws that might override parts of the EULA.

A EULA, as its name suggests, is a license agreement. So far as I'm aware, most nations capable of accessing LinkedIn have a definition of a license agreement. Insofar as I'm aware, they all require a license agreement to at least be:

"A valid agreement between two parties, where both parties have read, understood and accepted responsibilities (or had ample opportunity to do so), pertaining to the use of the licensed item."

Prior knowledge is a requirement. You can't agree to something you haven't had the opportunity to comprehend.

But LinkedIn happily gives you a copy of their data before you are able to access the agreement. (Such as if your first visit was to a public profile page).

There are many laws that may invalidate the EULA.

---

> It's purpose there is to establish that LinkedIn is not providing a public service.

Its purpose is irrelevant if it is not binding.

A store can put a sign up, saying that only customers who buy a product before leaving may enter. But if someone does, the store cannot force the individual to make a purchase, because their policy was in conflict with other systems of rights.

If something is non-binding, and therefore invalid, it cannot be applied as... It has no validity.

If you have a driver's license, but it became invalid for some reason, you would not be permitted to continue driving, until such time as it became valid.

If the ownership of your house became questionable, you would be squatting.

The non-binding status of any agreement that becomes invalid, regardless of intention, is a problem in law, but it isn't a solved one.

If a license is invalid, you are not bound by it.

---

Caveat: I'm no longer a registered lawyer, as of two years ago. I may not be up-to-date on some things, and my main knowledge was in cross-border and Australian crime, specifically in the realm of IT.


> I should have said, "the ruling in this case, has said it isn't clear if the agreement should be binding".

The injunction didn't say that either. The only thing it said is that LinkedIn can't block HiQ for the time being. This is common in lawsuits that both parties be prevented from action until a decision is actually made. The decision has not been made yet.

> Nobody is taking anybody's data.

I think I used a poor verb, or you misunderstood me. I meant that HiQ wants to copy LinkedIn's data for their own business. In some sense that can be viewed as theft, and that is the way LinkedIn sees it. Under that view, the verb "take" is appropriate, but it doesn't mean that the original copy is transferred or destroyed, it just means that HiQ is now in possession of a copy.

> There are many laws that may invalidate the EULA.

True, and I don't claim otherwise. "No court has ruled on the validity of EULAs generally". https://en.wikipedia.org/wiki/End-user_license_agreement#Enf...

> Its purpose is irrelevant if it is not binding.

It's (a EULA's) main purpose is for communicating expectations, which I'm arguing is relevant even if it's not binding. If the EULA says "we can refuse service to you", and then service is refused, then it's not a surprise.

In a legal sense, this could (but is in no way guaranteed to) reduce liability. What I'm suggesting is that even if the contract is not binding or valid, if you break the rules and get banned from a site, the EULA may still provide a defense in court from the site being sued by the person to whom service was refused. The site can say "we posted the rules, this person broke the rules" and the person may not have any legal support in favor of getting the service after they broke the site's arbitrary rules.


> It's (a EULA's) main purpose is for communicating expectations, which I'm arguing is relevant even if it's not binding. If the EULA says "we can refuse service to you", and then service is refused, then it's not a surprise.

Also not a surprise when a judge orders you to restore access because your agreement is invalid.

> The site can say "we posted the rules, this person broke the rules" and the person may not have any legal support in favor of getting the service after they broke the site's arbitrary rules.

Absolutely. Sites are largely free to enforce rules arbitrarily, by modifying their HTTP responses.

However, you are not free to exclude individuals whilst including their competitors.

Google has been under the hammer for that recently, though that is the EUs anti-trust laws. [0][1][2]. Of particular interest to this case, you might find this quote telling:

> we believe that Google's behaviour denies consumers a wider choice of mobile apps and services and stands in the way of innovation by other players, in breach of EU antitrust rules.

LinkedIn are accused of standing in the way of innovation by other players, in this case, hiQ, whilst simultaneously allowing other players to innovate, such as Google. One can copy the data, the other can't.

> It's (a EULA's) main purpose is for communicating expectations, which I'm arguing is relevant even if it's not binding.

I can expect the rain to move upwards, but that's irrelevant to how gravity actually acts. Unrealistic or false expectations are not taken into account with the rule of law.

A police officer might let you off with a warning for speeding, if you hadn't noticed the speed change. However, if it went to court, your false expectation of a different speed is not a mitigating factor.

If LinkedIn was wrong to prevent access in this case, their liability will not be reduced, if precedent is followed. They will still be responsible for the actions they took, in full, as Intel [3], Microsoft [4], Google and Apple [5] before them have been.

If however, LinkedIn are seen by the court as acting correctly, hiQ may be asked to pay legal costs, or counter-sued for damages.

If the EULA is non-binding, then it may as well not exist, because it has no legal relevancy.

[0] http://europa.eu/rapid/press-release_IP-17-1784_en.htm

[1] http://europa.eu/rapid/press-release_IP-16-2532_en.htm

[2] http://europa.eu/rapid/press-release_IP-16-1492_en.htm

[3] https://en.wikipedia.org/wiki/Advanced_Micro_Devices,_Inc._v....

[4] https://en.wikipedia.org/wiki/United_States_v._Microsoft_Cor....

[5] https://en.wikipedia.org/wiki/United_States_v._Apple_Inc.


> Also not a surprise when a judge orders you to restore access because your agreement is invalid.

That's not what happened here, there has been no ruling on any agreement, and the injunction order that was given only applies to HiQ, only temporarily, and nobody else. It is not a statement on the validity of EULAs or of LinkedIn's EULA, and it is not a statement on whether LinkedIn is being anti-competetive. It is an injunction and nothing else.


> That's not what happened here, there has been no ruling on any agreement, and the injunction order that was given only applies to HiQ, only temporarily, and nobody else.

I didn't say it was.

> It is not a statement on the validity of EULAs or of LinkedIn's EULA, and it is not a statement on whether LinkedIn is being anti-competetive. It is an injunction and nothing else.

An injunction is not given without merit. It has meaning.

Injunctions are regularly denied when the arguments are clearly in one direction or another.

The injunction strongly suggests that the judge finds hiQ's argument, that LinkedIn's public pages are not bound by the EULA, to "not be without merit".

No precedent has been set, but the conversation is definitively in the opening stages.


Dahart, would you please add your email in profile? Would like to followup on your older health-related comments. Thanks!

That only makes sense in the context of copyright. Users' personal information cannot be copyrighted by LinkedIn.

Yes, because "publish" implies breach of copyright.

If I post an essay on LinkedIn, and then someone posts it on their blog, copyright has been breached because that is my original work.

If I post the fact that I worked at Dunkin Donuts from 2007 to 2009 on LinkedIn, and then someone records it and feeds it to an employee quitting predictor algorithm, they've done nothing illegal. Me stating the fact that I was employed at a certain place for a certain amount of time is not me publishing an original work.


LinkedIn never said the data was 'public' in the sense that you are using it. You are assuming that just because it can be accessed for free, by anyone with an internet connection, that it is therefore in the public realm. That is incorrect.

As an example, is Netflix's collection "private"?

If yes, would it still be if they charged only $0.01 for it?

If yes, would it still be if the price was $0?

It seems silly to me to have the "rules" depend upon the price.


The rules depend on whether something is copyrightable or not.

Movies are copyrightable.

Compiled catalogs of personal information are not (https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R...).


> They collected the data

No they didn't. Users input most of their data.


The implication is that the company that serves public data could impose conditions on the use of that data, for example they could:

  1. ban the use of ad blockers when accessing the data
  2. ban users making an offline copy to view later
  3. ban users from disabling auto play or other features
  4. otherwise control what you do with data once you get it, which is *huge*. 
     E.g. what if they want a 1% share of any revenue you get by using the data, etc.
I think this really restricts freedom and has some scary implications for the future of the web.

Of course now, they have a technological option to try to force each of the above, but users also have a technological option to try to outsmart them. But I wouldn't want to give them a legal right to force the above.


Companies already do all those through technical means. In fact they have the full force of the law behind their efforts because they simply put some DRM on it and now it's illegal to try to circumvent.

I would LOVE it if the courts would remove the legal protections of DRM. It seems so strange that this court has gone so far in the viewer-rights direction, but hasn't bothered taking the baby steps to remove the legal protection of DRM.

Hmm, now I'm hoping LinkedIn implements some DRM so this fight can get truly interesting and maybe make some positive difference.


I'm not sure whether data like this can be copyrighted or is considered a creative work. The creative work would be things like the LinkedIn logo or graphics, and these fall under IP protections that limit what you can do with them even if they are freely available.

As a user, it's in my benefit if a competitor comes along, takes LinkedIn's data that they are freely publishing on the public Internet, and does something useful with it.

The correct analogy would be if someone took a copy of my personal resume that I put online, freely accessible on the Internet and did something useful with it.

Heck, Google does this already by indexing and providing a directory of public content. The fact that LinkedIn 'allows' them to do this is by virtue that it makes business-sense to do so and drives traffic to their site.

The rule should be plain and simple here: if you put user content online and do not make any efforts to restrict it (i.e. no passwords, no logins), call it "public information", you do not have any rights to say who can and cannot access that content, at the minimum. Unless I'm mistaken, you also cannot claim copyright infringement, as the user technically owns that content as well -- you just have a license to publish it (either to a private or public audience).

It should be up to the user --- and in fact their right --- to police their own content online. Personally, I find it offensive that LinkedIn seeks to restrict the distribution of such content that I have published through their service, where the expectation it is public. They are not acting in my interest here, they are very clearly acting in their own selfish interest, which I find odd considering LinkedIn's supposed mission has always been to empower their users to achieve professional success. How exactly are they empowering me by restricting who I have told them can access my public content? And the fact such restrictions are solely decided by LinkedIn with no input of their users -- the ultimate owners here -- is a disgrace and violation of their own mission statement.

This kind of concept is exactly what the Internet was founded on, folks. To say or think otherwise strikes at the heart of the open web and representing yourself as such is an affront against the great platform that has given rise to so many companies and provided so much opportunity in the world for the individual.

This concept is bigger and more powerful than any one company, and deserves to be defended.


> As a user, it's in my benefit if a competitor comes along, takes LinkedIn's data that they are freely publishing on the public Internet, and does something useful with it.

In this case, it's not to _your_ benefit. They're going to warn your boss that you will quit soon.


> If you're offering a service that is public, with the intent to your users that such information will be available publicly, you cannot then police what users of that data you consider to be "public" because it serves your business interest.

I wonder if the same ruling would apply to companies using Twitter's data feed? If so, it would be important in breaking open data silos.


Not sure how this view should be interpreted:

  - Websites should not have ToS
  - Websites may have ToS, users should be able to violate it without consequences if they don't like it ( = roughly current state of affairs)
  - Websites may only have ToS derived from some criminal act
  - something else?

How about "Website ToS should only be legally enforceable if users have actively indicated acceptance of them, not simply by saying that you agree to the ToS if you read the website".

The current injunction says LinkidIn can't even refuse to serve pages if they detect a ToS violation.

> If you're offering a service that is public

LinkedIn is not a public service, LinkedIn is a private, for-profit business. A public service is normally publicly funded. https://en.wikipedia.org/wiki/Public_service

> with the intent to your users that such information will be available publicly, you cannot then police what users of that data you consider to be "public" because it serves your business interest.

LinkedIn licenses their service to both free and paid users, and they can legally and do attempt to police what users can access the information. Whether it's enforceable or not, LinkedIn has a EULA and use of their service, whether you have an account or not, is presumably governed by their license.

They are well within their right to restrict requests being made by spammers and DDOS attacks, for example. How do you tell between legitimate requests and abusive ones, and how could you compel public access without enabling abusive ones?

> LinkedIn, of course, wants to get all the benefit of the public Internet with providing as little as they can.

The internet isn't a public service yet either, at least in the US. I think it should be, but it currently is not. This is conflating the knowledge that everyone with a computer and net access can currently access a LinkedIn server with the idea that everyone must be able to.


> > If you're offering a service that is public

> LinkedIn is not a public service

A service that is public(ly accessible through Internet) != Public Service.


That is correct. It's unfortunate, but such is the difficulty of language. "Public service" is a term that has precedence and legal meaning in the US, feel free to check out the article that I linked.

It's totally fine to call a service that is publicly accessible a public service, but it is going to lead to miscommunication if you are suggesting that this public service should have the same kind of legal requirements and regulations that the other kind of "Public Service" already has.


The GP did not call it a "Public service", but a service that was public, and the further context around the comment made it abundantly clear they were talking about access, not government provision.

The very first thing you did in quoting the GP was rearrange those words to suit your hobby-horse; a classic straw-man.


I don't understand what you mean.

The GP said "you cannot then police what users of that data you consider to be 'public' because it serves your business interest."

Yes, they can.

That sentence is explicitly describing the public expectations of a public service in the government sense. LinkedIn can police whatever they want because of their business interest, precisely because they are a private entity and not a public service in the government provision sense. Despite their offering information to unregistered site visitors, they are within their rights to refuse service to anyone at any time for any reason.

I did not rearrange GP's quote, and GP seemed to be conflating public access with government provision, which is why I brought up the distinction.

Are you saying that you agree with GP that LinkedIn should be compelled by law to provide public access to all?


> Yes, they can

The judge just said .. no they can't. Until the judge's ruling is overturned. Your statement is incorrect.

And you keep using the phrase "public service" which is not the issue at hand.

A store owner can not dictate who is allowed to read or take pictures of their store window. Effectively the judge was saying that if LinkedIn offers information that does not require a login - LinkedIn can not then tell someone that they can't use the info that is publicly visible.

No mention of "public service" - please stop conflating the two concepts.


> The judge just said .. no they can't. Until the judge's ruling is overturned.

This was an injunction, no ruling has been made.

> Effectively the judge was saying that ...

This is an injunction in LinkedIn vs HiQ, the judge did not share an opinion about site visitors who aren't logged in, nor make any ruling about whether publicly visible data can be restricted or not.

> A store owner can not dictate who is allowed to read or take pictures of their store window.

True, from outside the store. But to make a more complete analogy to this case, a store owner can dictate what you can read or photograph while inside the store. And the store owner can legally block the view to the outside anytime she wants.

> "public service" which is not the issue at hand.

I just explained above, and maybe you reacted quickly without understanding what I wrote. I'm not sure why I'm getting heavy pushback on this, it is both accurate and not controversial.

I reacted to @iamleppert's idea that LinkedIn can't police it's users. The fact is that they can police their users (except for HiQ until the case is over). My interpretation is that he was saying they shouldn't be allowed to police this "public" data. Do you think I misunderstood? I'm not conflating the concepts, I am distinguishing between them. It may be that one is a red herring or that I misunderstood, but on 2nd and 3rd reading it still looks to me like GP is suggesting that LinkedIn should not be allowed to restrict access to some users. If that were true it would turn LinkedIn into a public service, hence the reason why I'm talking about public services.


If the store owner tells you information you can write it down and share it. They can refuse to answer questions but they cannot take back any public data they made available.

They have a right to ignore your question and put rules around who they will respond to and when.

When they respond with data that cannot be copywritten (a name, address, title of past position,etc) of course someone can reuse those pieces.


They can refuse to answer questions

Which is what LinkedIn did, and what HiQ is suing them over.


> LinkedIn is not a public service

Boo. OP said it's a service offered to the public. Don't need to move the goal post.

> Whether it's enforceable or not, LinkedIn has a EULA and use of their service, whether you have an account or not, is presumably governed by their license.

Turns out I also have a EULA. And when linkedin responds to my HTTP requests, they opt into my EULA agreement.

It sounds like LinkedIn doesn't want to be accessible from the world wide web. Going offline is always an option.


> Boo. OP said it's a service offered to the public. Don't need to move the goal post.

Nothing moved. OP also said, in the same sentence, "you cannot then police what users of that data you consider to be 'public' because it serves your business interest." Yes, they can. It is clear from context that OP was suggesting that LinkedIn should be held to the legal standards of a "Public Service" in the government entity sense of the term. This is not the case, LinkedIn has no legal requirement to provide anything to the public. Booing me doesn't change that.

> Turns out I also have a EULA. And when linkedin responds to my HTTP requests, they opt into my EULA agreement.

Good luck with that!


Just like you can't fart in public and charge bystanders for the scent - you can't broadcast facts into the public and expect people not to recall them. I mean, you can. But good luck with that!

> you can't broadcast facts into the public and expect people not to recall them

That's a straw man, that is not the issue here. LinkedIn is not recalling the data, they asked to stop HiQ from scraping and collating their data and using it for their own business.

This is already in the EULA, so what HiQ is doing may already be breaking the law.

8.2 Don'ts You agree that you will not:

k. Develop, support or use software, devices, scripts, robots, or any other means or processes (including crawlers, browser plugins and add-ons, or any other technology or manual work) to scrape the Services or otherwise copy profiles and other data from the Services;

m. Copy, use, disclose or distribute any information obtained from the Services, whether directly or through third parties (such as search engines), without the consent of LinkedIn;

ae. Use bots or other automated methods to access the Services, add or download contacts, send or redirect messages;


(fun fact, not strictly important here) - 8.2.k was added quite recently [1], likely triggered by the HiQ case.

[1] https://blog.linkedin.com/2017/april/10/updates-to-our-terms...

[2] Here https://github.com/tosdr/tosback2/blob/master/crawl/linkedin... is an older version of the ToS, which one can read while enjoying the irony of a crawler focused on scrapping ToS


> LinkedIn is not recalling the data, they asked to stop HiQ from scraping and collating their data and using it for their own business.

It's not LinkedIn's right to tell me how to use facts. Just like I can't make LinkedIn liable for my EULA. It doesn't constitute an actual legal agreement.

That said I think the judge is wrong that LI should remove barriers from accessing the information. Information is speech. To mandate information be suppressed or produced is less than ideal.


> It's not LinkedIn's right to tell me how to use facts.

Yes, you're right about that. But it is LinkedIn's right to tell you how to use their service. Lots of people ignore what they say, and they might not be able to sue someone who breaks their rules, but they can state the rules.

They said they don't want bots scraping their site, and that's their right. They wrote software to detect specific bots & IPs, and refuse service. That's also their right, in my opinion.

Sites who want to use LinkedIn's database are free to collect their own facts instead. I don't know if this is what you were suggesting, but LinkedIn's refusal to serve HiQ's bots is not suppressing any speech, in my view.

> That said I think the judge is wrong that LI should remove barriers from accessing the information.

We totally agree. What are we arguing about?


"Just like you can't fart in public and charge bystanders for the scent - you can't broadcast facts into the public and expect people not to recall them. I mean, you can. But good luck with that! "

Do you also believe you can take the satellite tv signals beamed at your house and decrypt them? After all, they broadcasted them as widely as they possibly could! If they didn't want you to watch them, they shouldn't have sent it to you!

(This is a great in-theory argument that simply does not mesh well with our law in reality)


Personally, I do think it should be legal to decrypt them.

Same. Little did parent commentator know, there are plenty of radical "all information should be free" thinkers in these parts.

If the broadcasted stream is encrypted, and the key is not public, it is obviously not public data.

If the stream is unencrypted, the reasoning applies perfectly.


> I fully support this decision. If you're offering a service that is public, with the intent to your users that such information will be available publicly, you cannot then police what users of that data you consider to be "public" because it serves your business interest.

Obviously LinkedIn can't control the information itself. But this case isn't about the information in the abstract. It's about an HTTP request to a piece of private property, and how LinkedIn programs that private property to respond to an HTTP request. It's well-accepted that owners of private property can make it available to the general public, with whatever restrictions they please. There is no good reason to treat web servers differently than store fronts. LinkedIn should be able to control who accesses their web servers and how.


> There is no good reason to treat web servers differently than store fronts. LinkedIn should be able to control who accesses their web servers and how.

These two statements do not agree with each other. An owner of a brick-and-mortar shop can't (legally) stand out the front and bar black people from entering, for example.


I believe that store can stop you from coming inside and photographing all the displays.

I don't think they can stop you from photographing through the window.

It's an interesting case.


In this case you go in and request the price and they tell you.

They can't stop you from sharing the price its public now.


We are just gonna beat this analogy to death, aren't we?

No, they didn't tell HiQ. In fact, HiQ is suing LinkedIn because LinkedIn refuse to tell them.

No, they did tell then stopped on 5th product.

LIN gives this information (product price) for first few questions, then if you ask about 5th price they say: 403, no more for you. WHILE IF at the same moment, different person (or you in proxy-ip-disguise) comes and asks for 5th product price, LIN happily (and publicly) gives this information.

And if the person comes wearing googlebot tshirt, LIN drops to knees and give a ..... full-db-dump-job ;) Bing/Baidu thsirts also fit. Source: www.linkedin.com/robots.txt

HiQ simply says: that's unfair you can't decide who is good and whois bad. Effectively LIN bans any google competitor (documented case).


No, but they can stand outside and bar pretty much everyone else from entering. Race or national origin discrimination is a very narrow exception to that.

No but they can define terms of entry that apply equally to everyone.

E.g. you may enter and look around. You may not take notes, pictures, or otherwise record what is here.


But this case is about (using your analogy) someone standing on the street, photographing your building. You can't tell them they can't do it. If you don't want them knowing your apples are $1.25/lb then you need to remove that sign from your shop window.

They don't have to apply equally. They just can't discriminate against a protected class. I can put up a "no hispters" sign on my restaurant and enforce it.

But they can bar lots of people for other reasons. Block short people, people with black hair, people wearing red. It gets a bit murky when they block people for something that is related to a protected class but not directly because of a protected class. For example blocking anyone wearing a cross because they block everyone wearing torture devices from entering.

There're limitations to this. For example, you can take a look of other people's property from publicly accessible place.

Yes, but they are free to build walls, gates, and windows.

Building a wall is akin to limiting the site only to registered users. They don't do this because they want google to index them, but it indexes only publicly available sites.

I.e. that's ok to build a wall, but it's not ok not to build the wall but sue some people among the ones who take a look at the house.


LinkedIn is the one being sued, not the one suing.

Sure, you can do things that don't constitute trespass. But unlike what's inside a store front, you can't observe the contents of a web server without interacting with it (trespassing).

This seems really annoyingly tied to the technology of the web. If the public, non-authenticated web was built on a broadcast mechanism (like radio) instead of a request-response mechanism like HTTP, then this argument wouldn't apply. Hopefully the court considers whether it actually behaves more like the former than the latter.

Even with request-response, I don't see how this would be to LinkedIn's benefit. Their server receives the request, and sends the response. If they don't want it to send the response, they can change it accordingly. If they send the response as usual, how could the request be trespassing?

But they aren't sending the response as usual. That's why HiQ sued them, and what the court said they must do: "To the extent LinkedIn has already put in place technology to prevent hiQ from accessing these public profiles, it is ordered to remove any such barriers"

If the public web was broadcast at a range of radio frequencies the result would indeed be different. But it isn't. In my view, law should be tied to reality.

The web isn't an abstraction, it's a network of privately owned servers responding to requests. This order tells a company that they can't program their servers to look at who is making the request and refuse to respond on that basis.


> unlike what's inside a store front, you can't observe the contents of a web server without interacting with it (trespassing).

Then what do you consider the storefront, or public facing data of a website? Just the whois info on the domain?

The general public can't observe the locked contents of a server without hacking it. Hacking is trespassing.

I would modify your analogy to say the store front is public facing, just like some of the LI data is public facing. There is other LI data that is not public facing.

In my opinion, websites have a larger storefront, as well as multiple levels of access to internal data.


Making HTTP requests is the same thing as having rays of light reflect off the storefront, I'd say.

Actually, it's not. In your analogy, the storefront is completely passive and unaffected.

What is actually happening, is that somebody is walking into the store, asks a question about the stock or the price of the products on sale, which the store employee willingly answers.

Then, all of the sudden, the store wishes to control what you do with the answer that was willingly given to you.

This is clearly absurd - and so too is wanting to control what people do with publically-available HTTP data. If it's public, it's public.

I personally do feel that LinkedIn is within their full rights to attempt to detect and restrict content being served to screen-scraping agents, but they must then accept that screen-scraping agents must be allowed to use any means necessary to impersonate a "normal" user browsing the (public) information that they publish.

This can't be a one-sided freedom.


No that's not what's happening. What was happening is that the store clerk was noticing that an employee from a competitor was coming in and asking questions about the price, and then refused to answer the questions. The judge ordered LinkedIn to respond to the competitors HTTP requests.

Where do you draw the line between this and a DoS flood of HTTP requests? At some point a provider has to be able to rate limit requests to maintain service for legitimate users.

I don't. I think owners of web servers should be able to selectively choose to respond to requests however they please (so long as they don't violate any e.g. civil rights laws).

And they do control access.

When you type in a url a request is made and the server responds. Linkedin controls that response and can send back whatever it likes.

McDonalds can control who they sell a burger to but if I want to give my burger to a homeless man outside they shouldn't be allowed to stop me. In this case it's worse the place will tell me the price of a burger but won't allow me to tell the anyone else.

Once the information is public it has entered the public domain


Linkedin controls that response and can send back whatever it likes.

That's exactly what this injunction prohitibts them from doing: "To the extent LinkedIn has already put in place technology to prevent hiQ from accessing these public profiles, it is ordered to remove any such barriers"


Which is why this is an anti-trust case.

LinkedIn have freely provided public data to any competitor but hiQ. If they were preventing any company from taking the data, say by putting it behind a user login with licensing, it would likely not be under consideration.


I get your point but there's a really scary implication here that if I run a website I can't just ban any IPs I want that I deem to be abusing my service.

If I spot a bot scraping my data, I should have the right to block it.


Whenever there is a free market, market participants do what they can to restrict the market. This is the nature of the free market.

That's why it's necessary for government to regulate free markets - to keep them as free as reasonably possible. On the surface, this sounds like a contradiction.

Unfortunately it's also a little too easy to wind up with corporatism or oligarchy when government is afforded too much power to regulate.


But isn't the data stored on LinkedIn's servers technically owned by LinkedIn? I support this decision in the interest of innovation but hiQ is using data that is physically on LinkedIn's systems and which has been legally acquired by LinkedIn from its users. The term "public data" is a very broad definition and needs to be well defined.

Is the data still Linked In's once it leaves the server though? Hopefully the court will find that the answer is no.

If you serve that data to a client by a simple HTTP request it's publicly accessible. My guess would be that they could claim copyright over all material published on LinkedIn, which they probably do, however the fair use of that copyright would be the question.

In this case, it's a moot discussion. LinkedIn publicly disclaims ownership of the data.

So, how can this be twisted so crawling the Google SERPS is legal?

One of the big dynamics of information I feel is still woefully under-specified is the fact that information aggregation results in power. In that light, you could imagine some framework where certain uses of public information that are inherently not aggregative are somehow legally and ethically distinct from those uses which involve aggregation. I have no idea if this distinction would be relevant for the purpose of a company enforcing use of their content, but I think on the side of data use itself, it certainly is a meaningful distinction.

Every time there is a case about whether or not information is free to use in some way, it doesn't sit well with me.

As of now deep down in my heart I don't really believe in intellectual property. I do believe in respect, and in giving credit where credit is due.

But when all is said and done, I don't believe in copyrighting a number. I don't believe that it is anyone's right to dictate how bits that enter devices I own are used.

How do protect inventions and businesses, you ask? I say:

* If you have a secret you don't want to be distributed, don't share it with anyone you cannot trust. Trained models are a good example.

* Provide a service that only provides a limited amount of information per unit time. The user is free to use the information they obtained however they wish, but even a thousand users wouldn't have a way to copy your whole database in a reasonable amount of time. Google search, for example.

* Alternatively, if you are not sure you trust someone, take collateral from them. (e.g. "It is legal to share this information about me, but then it is also legal for me to share this other information about you [that you probably don't want shared]", or "If you share this information about me, you will be kicked out from the company/platform/etc.", or an Ethereum smart contract that causes you to be fined if someone else demonstrates that they got a piece of information that I shared with you.)

* Build businesses that have stronger value propositions than restricting how information is used. Network effects, physical hardware, good service and support, good use of massive amounts of back-end data that only you have, are all options.


> But when all is said and done, I don't believe in copyrighting a number. I don't believe that it is anyone's right to dictate how bits that enter devices I own are used.

Numbers can and are used to represent anything and everything. This has implications far beyond DRM. For instance: I don't believe you have the right to use my photo to make a defamatory Facebook account in my name.


In this case, I say the crimes are:

- defaming

- claiming that I am you

and not in copying numbers.


If you stick a knife in someone's chest the crime is murder not wielding a knife.

To paraphrase:

> I don't believe that it is anyone's right to dictate how the knives I own are used.

And yet the law still dictates you keep your knives away from my chest. Point being: your freedom of using your bits is dictated by the same laws that dictate your freedom of using knives.


Well put!

MS sucks what can I say?

On one hand, LinkedIn is like Twitter, Craigslist and Delicious in that it has sat on a treasure trove of data without helping users mobilize it. (All of the premium services they offer are outright lame; if there was a market for premium services we might seem some good ones.)

On the other hand, privacy is an issue too. LinkedIn lets you download a spreadsheet with the email addresses of all your connections, and if you have a lot of connections you will regularly get e-mail messages from life coaches, "managing directors", software development outsourcers, "SEO experts", and all kind of BS artists.


HiQ signals your employer elevated likelihood of you jumping ship. Depending on situation, this could favor a pay discussion or put you at a disadvantage.

Here's a copy of the pleading: http://www.almcms.com/contrib/content/uploads/sites/292/2017...

"In a press statement, LinkedIn says: "Our members control the information that they make available to others on LinkedIn and they trust us to honor that control. HiQ is taking member data, without their knowledge, and using it for purposes our members haven't agreed to.""

I use a text-only browser. As old-timers know the first web browser back in the early 1990's was also text-only. Text-only atill works great, believe it or not. Especially for reading information like one finds on LinkedIn.

I cannot acess LinkedIn after the Microsoft acquisition.

Microsoft says members control access. Do they check a box that says

   [ ] Disallow blind users from viewing my profile as text-only.
or

   [ ] Disallow access by HiQ.
The Microsoft statement says HiQ is "taking member data" without their knowledge. This sounds as though members are unaware "they" had a decision to make whether HiQ can access their profile, or whether a noncommercial user can access their profile with, e.g, links, w3m, lynx or the W3 Consortium's line mode browser. Indeed I doubt members either know or care about such access.

And if Microsoft partners with a company such as HiQ, then do members get a veto on use of their profile?

I think members probably have no right to even be informed that such partnerships exist!

Did LinkedIn members have any say in whether the company could sell itself, and control over access to their profiles, to Microsoft?

Web users today generally have little ability to control how companies share their data (e.g. with advertisers and partner companies).

We saw this type of argument in the 3Taps case. Where we were asked to believe Craigslist users were the ones enforcing their copyright or that they designated Craigslist to act as their agent. Anyone who uses the web knows this is utter BS.

"U.S. District Judge Edward M. Chen will preside over Thursday's hearing. At an earlier proceeding, on June 29, he appeared torn by the issues presented. On the one hand, he expressed skepticism that the federal Computer Fraud and Abuse Act-a criminal statute-really barred the mere use of bots to harvest public information. "You can get it manually if you hired a hundred million people to do it," he observed, "but if you want to do it quickly and automatedly, you can't do it? That is a crime?"


The vast majority of blind users do not use text-based browsers. This is a common misconception! They use normal web browsers controlled by accessibility tools like JAWS or VoiceOver.

Here's a page with what appears to be all the case filings:

https://www.hiqlabs.com/legal/

Today's Order is bad news for CFAA fans:

"In particular the Court is doubtful that the Computer Fraud and Abuse Act may be invoked by LinkedIn to punish HiQ for accessing publicly available data..."

This is the same judge who tried one of the early CFAA cases that LinkedIn cites in support of its position. He is no stranger to the statute. (Perhaps he disagreed with Breyer's ruling in 3Taps.)

In the past few years LinkedIn has updated their User Agreement and Privacy Policy and expanded permission for third parties to access member profiles. Access by third parties is not limited to only selected search engines.

They allegedly allow members to opt-out of these data sharing partnerships. Otherwise the sharing is on by default.

Whether they actually disclose the identities of these partners I am not sure.

The Court seems interested in what members actually want, instead of only what LinkedIn wants for its members.

It wants to know about how LinkedIn members can control access to their own information through settings versus how LinkedIn can control it, allegedly on it members' behalf.

The transcript of the hearing for the TRO, specifically the Court's comments and questions, gives some insight on Chen's thinking about this case. After today, I think he is on the side of users. A dismantling of the CFAA as a tool to intimidate potential competitors (including users) has been a long time coming.

LinkedIn is asked why they let the HiQ scraping continue for so long before sending a cease and desist. And they are asked how they know that scraping is harming user trust. Have any users actually complained?

They are also asked what happens if a member would want to "opt-in" to the HiQ scraping.

LinkedIn counsel starts rambling about the CFAA and the court cuts him off to go back this simple question.

"Why not give consumers an option?"

LinkedIn starts rambling about CFAA again, drawing comparisons to Nosal.

Court cuts him off. "... it seems completely different. I mean, I tried the Nosal case. That's getting into the interior mainframe of a company to steal trade secrets, not collecting data that is otherwise publicly available."

Court: "... if you think it's the same, you can think it's the same. It's not the same in my book."

Today's Order confirms this thinking. CFAA is out.

As for whether HiQ and Prof. Tribe can make raise a consitutional issue (which would be great for users IMHO):

"... once you say the CFAA arms private parties and sanctions private parties to block access to information that otherwise is now public and available to the public -- at least it now raise the specter, a higher specter of constitutional analysis than if it were purely private action."

It is still a longshot but the Court seems to recognise the constitutional question is possible if HiQ strengthens its arguments. Today's Order confirms this. Court stated it is not satisfied with HiQ's consitutional arguments "at this juncture." There is still time to refine these arguments.

Court to LinkedIn: "... I'm not moved by your argument that, well, you use a bot to receive information, that's totally outside the ambit of the First Amendment, assuming there's any First Amendment to apply here, which is the bigger threshold question, it seems to me."

Court to HiQ: "I don't know -- you're not making any technical U.S. Constitution First Amendment argument."

LinkedIn kept trying to argue Hicks as supporting their right to ban HiQ from access in spite of any possible First Amendment protections.

Court: "Frankly, I don't find Hicks exactly very helpful and informative to what we've got to deal with here."

The other interesting comments from the Court in the TRO hearing were that LinkedIn does not have a copyright violation to assert.

After today's Order, LinkedIn needs another theory given that CFAA is out. Based on the comments in the TRO hearing copyright infringement is probably not going to work either.


I would prefer if my LinkedIn data was not public in the sense that one should have to be logged in to see it, and ideally I should be able to see who viewed it. If you have to be logged in to look, they can obviously limit the number of profiles you can view to prevent scraping.

Awesome! The day after this post I logged in to LinkedIn and they promoted a (new?) feature where you can restrict who sees your profile. I turned it off as much as possible for people who are not logged in. IMHO they should probably make that the default if they don't want people scraping data.

[English is not my primary language, so I'm probably wrong]

Shouldn't the title of this thread include a verb, like scraping/collecting?


Well, good, right?

This is the same outcome most of us wanted between Swartz and JSTOR, and perhaps with Malamud and PACER. No technical control can be in the right place, but we can hope for a common understanding (maybe eventually law?) that terms of service may demand or prohibit some things but not anything.


When these decisions are made, I hope they come with technical guarantees on ease of accessibility to data.

For instance, "buried in 6 layers of obfuscated XML" and "accessible in O(N^3) time" would both be implementations that are not "blocking" the data but they would still be extremely difficult to use.


In a landmark decision that maintains you can take a picture of your community bulletin board... le sigh

out of curiosity, does anyone know if this company is scraping just whats available if you're not logged in or are logging in and scraping. I believe Linkedin shows different information based on if you're logged in or not ?

I don’t see anyone linking to the actual ruling, so I grabbed it from PACER. Here it is:

https://drop.qoid.us/linkedin-081417.pdf


Does the startup want to scrape the public profiles which you see when logged out of LinkedIn? If yes, this profile data is of little value because it's probably 5% of the real profile data (mainly just the summary) and often there are no public available at all, many times these profiles are turned off for public virw.

Or do they mean the 'public' profile which you see when logged in? If yes, this would be a real case because this is awesome data I would like to scrape and which you could build interesting business cases with.


Can someone explain to me why didn't the robots.txt apply to HiQ's crawler? Is crawling still legal while the robots.txt disallows your crawler?

Yeah, it is. Robots.txt is just a way of politely asking people not to crawl your site. It doesn't have anything to do with the law.

Thank you. I'm not sure if this analogy is correct but isn't that like a host of a museum telling people do not take pictures yet someone did it anyway and sold the pictures he took?

I don't quite think so. There are specific laws governing the use of of, say, photographic equipment within a private property.

If a web server, on other other hand, willingly serves content to both a browser being operated by a human, as well as screen-scraping software, then it shouldn't try to prescribe how the screen-scraper uses that information.

It would be the equivalent to, every time, asking somebody that works in the museum if you can take a picture, and them saying "yes", and then wanting to complain (or sue) afterwards.


If a web server, on other other hand, willingly serves content to both a browser being operated by a human, as well as screen-scraping software, then it shouldn't try to prescribe how the screen-scraper uses that information.

That's not what was happening here. LinkedIn's server was blocking HiQ, and HiQ sued LinkedIn to prevent them from doing that.


I'm confused. Is the judge saying that LinkedIn can't use the law to ban HiQ from scraping their profiles or that they can't implement technology to block scrapers? The former seems reasonable but the latter seems like an unjustified restriction on how they operate their site.

Looks like technology... Which seems strange to me.

Less strange given that LI is a monopoly and the judge was arguing that LI was unfairly restricting competition in HiQ's business space.

How does the tinder debacle fit into this?

Linked IN could nuke this problem with a decent EULA.

From reading the ruling, the injunction was based on a finding that HiQ raised serious questions about whether LinkedIn blocking HiQ's scrapers constituted a violation of California's unfair competition law by violating the spirit of federal antitrust law.

HiQ argued that LinkedIn has a monopoly on "the professional networking market" and is unfairly exploiting that monopoly to gain an advantage in the data analytics market. HiQ showed that LinkedIn might be developing an analytics product that competes directly with their Skill Mapper product.


Great. So in a business that makes money primarily by collecting and selling user data, they're now required to give it away for free.

Correction: user-generated data. LI get the content from their users for free.

I dont understand how you came to think that. What about the cost of hosting and serving the html forms (cpu and bandwidth), processing the content update requests, storing the resulting data structures? What about the marketing costs of making the brand known both by individuals and companies? The wages associated to any of the operations i have just mentioned?

What definition of "free" are you using here?


I agree there seems to be a problems with means and investments: if I invest to create a website with valuable data, and anyone can come and scrape my data and start their own or a competing business, riding on top of my investment, isn't that a disincentive to me?

I'm curious how a ruling in this might potentially impact Google (or not). Google is scraping those same profiles, but Linkedin clearly has no issue with that because it drives traffic to their site. But Google is also making money off of those profiles.

How can Linkedin argue that Google be allowed to scrap but other third party cannot?


Microsoft/LinkedIn has so much more juicy and actually private data that nobody can scrape, I don't understand why they would even make a scene and tarnish their image over scraping. They know who declines connection requests from whom, when a CEO starts looking for a new job, and so on.

Interesting. Does anyone know how this compares to the legal cases around startups that would scrape Craigslist public data?

So does this ruling essentially outlaw robots.txt? If I only give access to certain users, that's... illegal now? We're calling that a monopoly? How is this reasonable to anyone else?

The most surprising aspect of this is having anyone manage to consider any of the data valuable at the tire-fire of a social network LinkedIn.

How does this affect Craigslist's Cease and Desist request to padmapper? [0]

[0] http://blog.padmapper.com/2012/06/22/bye-bye-craigslist/


This medium post [1] "The Birth And Death Of Privacy: 3,000 Years of History Told Through 46 Images" gives an interesting and unintuitive context about [personal] privacy, which is relevant in this debate about how to balance personal privacy w/society's value in openness [and honesty].

[1] https://medium.com/the-ferenstein-wire/the-birth-and-death-o...


This type of decision might also impact Yelp or any others in similar businesses. Currently their API limits a top few reviews per business via their API, and also prohibits "scraping" of data in other means.

I was going to do some experiments with larger datasets from businesses in a region, but quickly found that's not possible.


And Google search results as well. Search results are publicly accessible but if you try to crawl them, Google will block it.

If it becomes illegal to block crawlers, then Google is gonna get hammered with bot traffic.

It will also mean that google won't have to be the front-end to search results, and anyone can build on top of it, which could kill google ad revenue because then you could create anonymous google searches.


Google already isn't the only frontend to their search. Look at Startpage, for example.

You might try Apifier for that, we've recently scraped more than 150k reviews for 27k restaurants in London.

Here's a community crawler you can use: https://www.apifier.com/community/crawlers/Yonny/bcYqH-api-u...


Yelp has several public data sets available for research purposes. Might not be the region you were looking at, but for academic purposes, might be useful.

I do find it odd that LinkedIn is fighting this considering they outright steal your contact list and will spam your friends and family for years.

Recently I was setting up my new phone and thought about installing their app and I thought to myself, why?

Eventually that thought came back to me when I was attempting to update my profile and simply decided to delete it entirely.


Sorry a private company can do whatever it wants with its own property. They're paying for the power and bandwidth...

This needs to be overturned unless LinkedIn is violating FRAND priciples.


Will this mean Google search can now also be scraped.

This is a preliminary injunction, which is a tool to prevent irreparable harm to a party before a case is resolved on its merits.

It shouldn't be taken as a strong indicator of what will be found legal in this case, much less a different hypothetical one.


Shouldn't LinkedIn simply introduce captcha?

For accessing the public page of profiles?

Well, not just for accessing, but for scraping, meaning accessing a huge number of those pages in an automated way.

How would that work for Google?

If you think they shouldn't show it for Google, then that's what HiQ is suing them over :)


I have a simple question: what is "public information" with respect to websites? Anything I put on a privately owned website automatically becomes "public"?

The question of "data protection" hasn't been discussed enough here - it may well be that Linkedin has no case against HiQ, but if HiQ is scraping people's PII in the EU they are required to have the permission of the data subjects.

How this interacts with Safe Harbour I have no idea.


IANAL but I suspect the forthcoming GDPR will make this illegal for EU customer data. It gives users greater control over their data, so I wouldn't be surprised to find that it corresponds with this judgement, i.e. if a user determines data to be publicly available, it must be made to be so.

Anyone know whether this is right?

And BTW in case you're not aware, if you hold data from any EU citizens you'll be required to comply with the GDPR regardless of where you're located.


> U.S. District Judge Edward Chen in San Francisco granted a preliminary injunction request brought by hiQ Labs, and ordered LinkedIn to remove within 24 hours any technology preventing hiQ from accessing public profiles.

That is actually dangerous. Why some startup or some judge can tell me to whom I can serve content and to whom I cannot?


The question not really answered is, what is public profile data? Is it visible to the general public, other linked in users, partially hidden?

Here in Germany there is copyright protection for databases as a compilation separate from individual facts in the database.

They needed to make it available within 24 hours - does that mean public profiles are now scrapeable like any other page?

I tried a year ago and obviously it was impossible.


I wonder what would happen if some startup started using public Yelp reviews.

Nobody can require one to forget a knowledge, nor shall anyone be entitled to do so. Collection and use of data should be legitimate. Storing data in HDD and remembering things in brain are identical. Infringement of privacy, reputation, or copyright is another issue.

Legal | privacy