Hacker Read

Hacker Read top | best | new | newcomments | leaders | about | bookmarklet

login

		U.S. judge says LinkedIn cannot block startup from public profile data (www.reuters.com) similar stories update story
		779.0 points by techrush \| karma 356 \| avg karma 89.0 2017-08-14 21:04:18+00:00 \| hide \| past \| favorite \| 301 comments

view as:

DanBlake | karma 4129 | avg karma 6.33 2017-08-14 21:19:33+00:00 | [–] similar comments

This seems very at-odds with previous rulings (specifically, relating to craigslists many past dealings). Strikes me as being very unlikely to stand up to appeal. Also, linkedin will likely modify their websites behavior (make you click to agree before you view a profile) which would create a binding 'click wrap' stopping companies from scraping them.

devrandomguy | karma 959 | avg karma 2.56 2017-08-14 21:35:56+00:00 | [–] similar comments

That click wrap contract is kind of an interesting thing on it's own, for those of us who only enable JS when absolutely necessary. If I never see the agreement, and I am not specifically avoiding it, does it still apply to me?

jlgaddis | karma 11467 | avg karma 2.4 2017-08-14 21:37:20 | [–] similar comments

Did you click a button saying you agreed to it?

aneutron | karma 1447 | avg karma 4.91 2017-08-14 21:38:42 | [–] similar comments

Make the profile loadable via XHR and problem solved. For example.(Which I bet is already the case)

CosmicShadow | karma 1301 | avg karma 3.92 2017-08-14 22:07:42+00:00 | [–] similar comments

There was a big ruling in Canada about this specifically around MLS, the big real estate monopoly we have, so that if you go to their sites to search for homes, like you'd see at Realtor.ca, you have to click through a clickwrapper to access any data, and even if you automate past that, the fact that a human would have to click it means that it's illegal to scrape since you are forced as a human to agree to a TOS before you view.

devrandomguy | karma 959 | avg karma 2.56 2017-08-14 22:26:26+00:00 | [–] similar comments

Ah yes, that. MLS compliance was a source of many tickets for me, in a previous job, in Canada. The employer didn't even want me to waste time trying to learn it all, just follow the compliance officer.

IIRC, this stuff varies quite a bit from region to region, even within a single metropolitan area. Attempting to simultaneously comply with multiple independently developed rulebooks was ... fun.

I can't wait for shipyard startups to disrupt the housing market. /s

jlgaddis | karma 11467 | avg karma 2.4 2017-08-14 21:36:45+00:00 | [–] similar comments

Appeal? The case hasn't even been heard yet. This was a preliminary injunction; it's far from over!

dragonwriter | karma 118260 | avg karma 2.17 2017-08-14 22:33:40+00:00 | [–] similar comments

A preliminary injunction can be the subject of an interlocutory appeal.

jlgaddis | karma 11467 | avg karma 2.4 2017-08-15 00:22:15 | [–] similar comments

In this instance, however, it is not (as can be gleaned by reading the article).

bkanber | karma 3693 | avg karma 5.58 2017-08-14 21:43:38+00:00 | [–] similar comments

It's just a preliminary injunction used to maintain the status quo (ie, allowing scrapers) while the case is heard. A preliminary injunction is basically "ok everyone stop what you're doing, maintain business as usual until the court rules."

isalmon | karma 1297 | avg karma 6.12 2017-08-14 22:23:34 | [–] similar comments

The biggest reason why they have not done this so far is SEO. If you introduce the 'click wrap' - other crawlers like Google won't be able to crawl it, so their traffic will decrease overnight.

dawnerd | karma 7527 | avg karma 2.46 2017-08-14 22:30:18+00:00 | [–] similar comments

They'd most certainly whitelist the google ips.

fooey | karma 4151 | avg karma 7.45 2017-08-14 22:44:00+00:00 | [–] similar comments

that's against TOS for Google SERPs

Showing different results to google than you do to users is called cloaking and it's not allowed

dawnerd | karma 7527 | avg karma 2.46 2017-08-14 23:20:18 | [–] similar comments

Definitely, but someone at LinkedIn could get in touch with their google contact and make it all work and follow the rules.

https://support.google.com/news/publisher/answer/40543?hl=en

Apparently, we had something similar at Demand Media before the panda update.

WikipediasBad | karma 156 | avg karma 1.79 2017-08-15 08:06:22 | [–] similar comments

How does one build such google contacts if you are a seed stage startup? I understand it wouldn't be the same league as LinkedIn Reid Hoffman level contacts, but even some kind of contact at Google search?

dawnerd | karma 7527 | avg karma 2.46 2017-08-15 08:44:47 | [–] similar comments

I wish I knew. Easiest way is to make a ton of money off google ads like we did. Even then it was hard to actually talk to someone - especially at YouTube. And they had an office a few floors up...

razwall | karma 213 | avg karma 5.33 2017-08-15 02:34:43 | [–] similar comments

Neither of the Craigslist cases reached an appellate level. They were only district court decisions, so as I understand it, they only have persuasive value when applied to other cases. The judge in this case mentioned Craigslist v. 3Taps, and apparently was not persuaded by it.

DannyBee | karma 28474 | avg karma 5.43 2017-08-15 03:14:06+00:00 | [–] similar comments

Without having downloaded the order from pacer, if i had to guess, i don't think a click wrap would change anything.

The free speech argument is certainly a dud here. The argument that has likely had any weight at all would be the antitrust/unfair competition one.

A click wrap will not change that.

It's almost certainly about linkedin's repeated claims about how they don't own these, they are public info, and they want to make them public, and now is turning around and saying "just kidding!", and trying to put someone out of business who depended on that, all so they can start their own analytics product.

walterbell | karma 84571 | avg karma 5.55 2017-08-14 21:19:37+00:00 | [–] similar comments

Is the reasoning in this case different from the Craigslist/3taps dispute?

SomeStupidPoint | karma 2326 | avg karma 1.61 2017-08-14 21:21:50 | [–] similar comments

Does anyone know where to view this ruling?

I'm curious how it passes free-association muster: you're not allowed to discriminate on particular tasks, but there's no reason you can't discriminate based on eg, behavior or user-agent or IP address.

It seems very strange to me that the judge would order MS to associate against their will prior to hearing the arguments.

marksomnian | karma 878 | avg karma 3.14 2017-08-14 21:27:47 | [–] similar comments

It's a preliminary injunction, not a ruling. For all we know, ruling could be completely different.

dwynings | karma 16818 | avg karma 12.57 2017-08-14 21:28:55+00:00 | [–] similar comments

http://online.wsj.com/public/resources/documents/2017_0814_h...

SomeStupidPoint | karma 2326 | avg karma 1.61 2017-08-14 21:39:21 | [–] similar comments

Wow, that sign analogy is really faulty.

EpicDavi | karma 198 | avg karma 4.21 2017-08-14 22:00:11+00:00 | [–] similar comments

After skimming the document for a bit, hiQ's argument looks really flaky. Especially grasping at straws like "Free Speech". They argue that LinkedIn is like a public mall and denying them access to the mall is denying them "Free Speech"? I don't see how this can be the case if they had no intent to "speak" at all in this place. Their data collection via scraping seems more like people-watching in the mall, if you go along with their analogy.

razwall | karma 213 | avg karma 5.33 2017-08-15 01:47:53 | [–] similar comments

Indeed, and the court rejected that part of HiQ's argument.

"In light of the potentially sweeping implications discussed above and the lack of any more direct authority, the Court cannot conclude that hiQ has at this juncture raised 'serious questions' that LinkedIn's conduct violates its constitutional rights under the California Constitution."

bigtones | karma 2418 | avg karma 7.07 2017-08-14 21:23:57 | [–] similar comments

This is just a preliminary injunction and the court has not even heard or ruled on this case. They just allowed HiQ to access the data while they wait for the scheduled court hearing to begin. The court may eventually rule very differently once they have heard all the evidence presented and weighed up existing applicable case law.

The judge who issued this injunction - Edward Chen, is also the judge presiding over the Uber drivers as independent contractors class action case.

djsumdog | karma 25432 | avg karma 4.23 2017-08-14 21:56:51 | [–] similar comments

Wow, those are two very big issues, that affect two very large industries, not to mention the implications and precedent for both free speech and workers' rights.

schoen | karma 20422 | avg karma 3.65 2017-08-14 22:02:27+00:00 | [–] similar comments

There are not that many judges on the Northern District of California

https://www.cand.uscourts.gov/judges

and cases involving Silicon Valley companies are very often filed here, so quite a lot of the high-profile industry matters end up getting heard by the same judges!

DannyBee | karma 28474 | avg karma 5.43 2017-08-14 22:33:13+00:00 | [–] similar comments

"This is just a preliminary injunction and the court has not even heard or ruled on this case"

This is not quite right. One of the requirements to get a PI is a likelihood of success on the merits ;)

SomeStupidPoint | karma 2326 | avg karma 1.61 2017-08-15 00:29:40+00:00 | [–] similar comments

The order discusses how less is required of the merits (need only raise serious questions) if the consequences are dire (going out of business), then discusses how they depend entirely on LinkedIn.

It's also possible that the judge is giving them affordance before killing their business to prevent appeals.

DannyBee | karma 28474 | avg karma 5.43 2017-08-15 03:09:20+00:00 | [–] similar comments

FWIW: That is actually fairly rare. Usually the answer is "well, the creditors can continue the lawsuit if they think it is valuable"

cookiecaper | karma 15526 | avg karma 2.53 2017-08-14 22:51:40 | [–] similar comments

I agree that we need to be careful not to read too much into this, but in most scraping cases I know about, preliminary injunctions are granted as a matter of routine.

The fact that this judge refrained from doing so may signal that the judiciary is finally willing to bring some nuance and rationality to their interpretation of extremely broad statutes like the CFAA. It's a positive signal, even if ultimate victory remains unlikely.

/me is not a lawyer

ikeboy | karma 14321 | avg karma 2.95 2017-08-15 04:33:25+00:00 | [–] similar comments

Forcing LinkedIn to provide the data and remove any locks is pretty harsh.

Imagine I sued a bank under the theory that the locks on their vault doors are illegally preventing me from opening them. As part of the pleliminary injunction, the judge rules that the bank must remove all locks until the case is decided.

Point is, it's not just "maintain the status quo", it's "give them your data for free". IANAL but I don't think preliminary injunctions should change the status quo by intruding on what is plausibly someone's private property.

kakarot | karma 1527 | avg karma 1.01 2017-08-15 09:51:17 | [–] similar comments

Misleading analogy. Physical goods such as a bank's money are both not reproducible and not normally available for the general public to just take

charlesdm | karma 4113 | avg karma 1.89 2017-08-14 21:24:15 | [–] similar comments

"U.S. District Judge Edward Chen in San Francisco granted a preliminary injunction request brought by hiQ Labs, and ordered LinkedIn to remove within 24 hours any technology preventing hiQ from accessing public profiles."

Interesting ruling

danschumann | karma 1464 | avg karma 2.26 2017-08-14 21:36:22+00:00 | [–] similar comments

I would hope there is a special consideration for any anti-ddos technology they have. It would be hard to differentiate between a ddos'er and a scraper. Rate limiting for ddos attacks might affect a scraper, then the question ( that linkedin is asking ), is how low can we limit them without looking like we're blocking them. I have a feeling this isn't over!

0xCMP | karma 2556 | avg karma 2.46 2017-08-14 21:40:24+00:00 | [–] similar comments

I wonder if this isn't such a big deal since it's not like they're gonna verify beyond "can they scrape now?"

As long as that is true then they will likely not run in to issues. Other issues are not for blocking them and case can be made that it's a separate issue. Defending against common internet attacks is an easy case to make to a Judge. He can't be expect LinkedIn, in this case, to kill their service so someone can scrape.

danschumann | karma 1464 | avg karma 2.26 2017-08-14 21:31:17 | [–] similar comments

Being a programmer not a lawyer, I like the idea of more rights for scrapers. I don't want to see the internet partitioned away and owned by a few companies, especially when that information is often called a "public profile".

FLUX-YOU | karma 2705 | avg karma 2.21 2017-08-14 21:56:11 | [–] similar comments

Can they claim a tax credit for supporting that bandwidth usage and handling abuse?

tomc1985 | karma 11141 | avg karma 2.76 2017-08-14 21:59:32+00:00 | [–] similar comments

Not needed. Cost of doing business.

FLUX-YOU | karma 2705 | avg karma 2.21 2017-08-14 22:41:59+00:00 | [–] similar comments

I don't trust the US government to write good rights for scrapers. They can't even do computer crime sentences well.

At best, it's a burden for no solid gain for society. At worst, there will be loopholes used to DoS businesses because they can't shut down individuals due to law-given rights, and that will lead to court fights.

These rights would do nothing but save scraper authors from learning to obfuscate their actions.

tomc1985 | karma 11141 | avg karma 2.76 2017-08-14 23:02:41 | [–] similar comments

Information wants to be free. People should stop fighting it!

If one makes information "public" but don't really want to share it, then the public is fully justified in taking it.

biocomputation | karma 251 | avg karma 0.98 2017-08-14 23:47:42+00:00 | [–] similar comments

<< Information wants to be free. People should stop fighting it!

That's why you routinely publish your bank account, social insurance, and credit card details online, right?

tomc1985 | karma 11141 | avg karma 2.76 2017-08-15 00:08:44+00:00 | [–] similar comments

Not like I have a choice... and how are those things explicitly declared 'public'?

djsumdog | karma 25432 | avg karma 4.23 2017-08-14 22:01:11+00:00 | [–] similar comments

It gets into the incredibly murky water of how the web works. You're just issuing a request and getting things back. Sometimes in a web browser, sometimes not. But the content itself may still be copyright. You can't just take it, even though for now, the publisher/server is allowing you to view it for free.

But what if you only chose to view some of the content (e.g. block ads). What if you apply your own styles to change the way that information is displayed? You're just changing the way the browser represents that data. You're not redistributing it as your own at this point. What if you store that data, but don't republish it; just used it in some each algorithms?

There are a whole lot of interesting grey areas here, but many that already have precedents that side more with the copyright holders.

swiley | karma 10155 | avg karma 1.92 2017-08-14 22:18:24+00:00 | [–] similar comments

Maybe the whole idea of copyright is flawed and harmful?

bitJericho | karma 1832 | avg karma 1.65 2017-08-14 22:42:30+00:00 | [–] similar comments

There's nothing wrong with copyrights; 14 year copyrights.

int_19h | karma 21203 | avg karma 1.69 2017-08-15 07:34:30+00:00 | [–] similar comments

Why 14, though?

I'm well aware of the historical precedent, but that number was rather arbitrary even then - it was what a bunch of people agreed upon, based on their ideas and experience, and given the environment. It's doubly arbitrary today, considering how much the environment has changed. Is a 14-year copyright on software reasonable, for example, or too long.

Rather than making it a hard cut-off point, it would be interesting to come up with a scheme that attempts to capture the spirit of term limits.

Consider: why are copyright terms even a thing? Well, copyright is a monopoly on a thing that is not naturally restricted; it does not exist in the absence of society, and is therefore a privilege granted by that society. By itself, copyright is meant to encourage creativity in the interest of public good, and at the same time, to provide some means to derive profit from one's creative expression. So there are two conflicting interests at play here - the desire of the creator to be rewarded for the fruits of his labor, and the desire of the society to enjoy growing, constantly enriched culture. The copyright term, then, marks the point at which the latter trumps the former.

Instead, what we could do is capture the fact that the interests conflict. For as long as you hold copyright, you're effectively denying society the ability to freely enjoy the culture that you have enriched. Why, then, not tax the copyright accordingly? You could consider it a kind of intellectual property tax, but with a twist: the longer copyright is held, the more the interests of society are infringed, and the larger the compensatory payment required to maintain the copyright.

So we could start with a grace period of a couple of years that is completely free, then it starts growing steadily. For some really popular work that makes significant profits, the author could easily afford payments to maintain copyright for a decade or two (or however long - that is something that can be dialed arbitrarily). For things that are too obscure, payments would cease shortly, and they would fall to public domain. There wouldn't be such a thing as "abandonware" anymore.

What use to put the money to? Many possibilities there. Publicly sponsored arts and art education is an obvious choice. Another interesting example would be offering bulk sums of money to authors of culturally important works to surrender their copyrights sooner, so that the public can enjoy them.

dvdhnt | karma 2297 | avg karma 3.84 2017-08-15 03:25:18 | [–] similar comments

I'd love to see more debate about just this.

revelation | karma 11052 | avg karma 3.12 2017-08-14 22:20:03+00:00 | [–] similar comments

There is no grey area. You can not copyright facts. If you download ("scrape") a webpage and then extract the facts, whatever you downloaded only exists in volatile memory. So there is no claim there. The only claim you can make is on the download itself, hence what LinkedIn chose.

onion2k | karma 46633 | avg karma 4.72 2017-08-14 22:25:04+00:00 | [–] similar comments

I've seen plenty of LinkedIn profiles that could be classified as fiction.

wvenable | karma 19014 | avg karma 3.37 2017-08-14 23:37:59+00:00 | [–] similar comments

But LinkedIn isn't the owner of that fiction.

tyingq | karma 59102 | avg karma 3.47 2017-08-14 22:45:52 | [–] similar comments

It's possible that HiQ is downloading and processing things that aren't facts. Long passages of text in LinkedIn posts, recommendations, and comments aren't "facts".

Though that doesn't appear to be the path LinkedIn is using to fight it.

revelation | karma 11052 | avg karma 3.12 2017-08-14 22:51:59 | [–] similar comments

Copyright isn't thoughtcrime, if they aren't redistributing it to anyone with standing to sue (very likely not LinkedIn, no matter their ToS) they can process things all day.

tyingq | karma 59102 | avg karma 3.47 2017-08-14 22:57:57+00:00 | [–] similar comments

I did not say or imply "thoughtcrime". Just noting it can be more complex. Sentiment analysis of copyrighted text passages might be claimed as being a derivitive work for example. Fair use does have limits.

The standing to sue is an issue for user generated content, yes.

cvsh | karma 535 | avg karma 4.21 2017-08-15 03:05:19+00:00 | [–] similar comments

>Sentiment analysis of copyrighted text passages might be claimed as being a derivitive work for example.

So if a company releases a product that predicts likelihood of an employee quitting, you think you're going to have standing to sue because an analysis of a copyrighted passage you wrote comprised 0.000001% of the source material the algorithm was trained on?

tyingq | karma 59102 | avg karma 3.47 2017-08-15 03:40:59+00:00 | [–] similar comments

You left out my last sentence, on purpose I suppose.

The context was that scraping doesn't always get a free pass because "facts". This specific case may skirt it because of the user generated content. Doesn't mean it's not worth mentioning for the larger context that copyright isn't black and white.

moojah | karma 51 | avg karma 3.4 2017-08-14 22:44:34+00:00 | [–] similar comments

The interesting part here is that linkedin doesn't hold any copyright on much of the data. You cannot copyright someones name and title.

unepipe | karma 145 | avg karma 2.46 2017-08-14 23:07:48+00:00 | [–] similar comments

But you can copyright a database.

simcop2387 | karma 5031 | avg karma 2.25 2017-08-14 23:40:24+00:00 | [–] similar comments

that one is iffy, http://www.nolo.com/legal-encyclopedia/types-databases-eligi...

I'd suspect LinkedIn could argue about the network they create is a creative work and would be covered, but the facts about each person might not be copyright-able.

ekianjo | karma 34827 | avg karma 2.44 2017-08-15 01:19:26+00:00 | [–] similar comments

whats inside the database is not owned by them.

sliverstorm | karma 24708 | avg karma 2.23 2017-08-14 22:52:51 | [–] similar comments

I think most of those "grey areas" already have plenty of precedent in the world of art & music. Have you heard of "fair use", and what it allows and does not?

jlarocco | karma 7189 | avg karma 2.49 2017-08-14 23:24:19 | [–] similar comments

> It gets into the incredibly murky water of how the web works.

There's no "murky water" in how the web works. It's very clear and precise, and anybody can learn how it works. It has to be precise and well defined, because computers can't operate any other way.

If Linkedin doesn't want "public" profile data to be accessible to everybody then they need to stop calling it public and put it behind some kind of access control.

flashman | karma 3885 | avg karma 4.95 2017-08-14 23:54:36+00:00 | [–] similar comments

Computers are also operating with absolute clarity and precision when somebody exploits a flaw in their software to execute arbitrary code. That's why the CFAA has its language around 'exceeding authorized access' (even though it turns out that definition is dangerously vague).

avip | karma 6656 | avg karma 3.68 2017-08-15 01:03:42+00:00 | [–] similar comments

But that's exactly what LI did, then came a judge and ordered them (temporarily) to remove said access controls.

smegel | karma 2278 | avg karma 1.65 2017-08-14 22:01:50 | [–] similar comments

If a website puts something on the public internet, it should not even be aware if it is being accessed by a scraper or a human.

Maybe we should just ban User Agent strings and be done with it.

rev_bird | karma 1532 | avg karma 3.37 2017-08-14 22:05:56 | [–] similar comments

I'd be willing to bet that the user-agent field isn't the problem; it's patterns that everything looks for now, right? People have been lying in the request headers for decades.

Maultasche | karma 583 | avg karma 2.58 2017-08-14 23:46:48+00:00 | [–] similar comments

Yeah, I remember a plugin in the early days of Firefox that allowed Firefox pretend to be Internet Explorer. It would send request headers that made it appear to be IE.

There were websites at the time that would display just fine in Firefox, but would refuse to display anything if they detected a non-IE browser.

sbarre | karma 10170 | avg karma 4.57 2017-08-14 22:27:34+00:00 | [–] similar comments

You call it the "public internet" but it's most definitely not a public space or anything like it.

Private entities own and operate all(most of) the servers, services and conduits, and that does need to be paid for and maintained.

I'm not saying I agree with Linkedin in this particular scenario, but this is about two commercial for-profit entities arguing over money, so let's not make it about something it's not.

smegel | karma 2278 | avg karma 1.65 2017-08-14 22:34:04+00:00 | [–] similar comments

> Private entities own and operate all(most of) the servers, services and conduits, and that does need to be paid for and maintained

And are MORE than happy to send the content of their servers to unsolicited, uninvited, anonymous guests on mere request. No-one is forcing them to do so!

sbarre | karma 10170 | avg karma 4.57 2017-08-15 19:21:20+00:00 | [–] similar comments

And that's exactly LinkedIn's position here.

No one should be forcing them to send their content to anyone.

They claim they should be allowed to discriminate at their discretion who they respond to, since they own and operate the servers.

This "no one is forcing you to send your content" goes both ways..

If one side is going to say they're entitled to receive the content on request, the other side wants to be able to say they're entitled to refuse to answer that request..

bogomipz | karma 8657 | avg karma 1.34 2017-08-14 22:43:49+00:00 | [–] similar comments

>"You call it the "public internet" but it's most definitely not a public space or anything like it."

How is it not a "pubic space"? They publish publicly visible A records for their site as well as route their public IP space to transit providers in order for the public to be able to reach their site.

umanwizard | karma 14757 | avg karma 2.5 2017-08-15 01:21:00+00:00 | [–] similar comments

If I publicly distribute posters about a party in my apartment and leave the door unlocked, do I still have the right to kick out anyone I don't like?

bogomipz | karma 8657 | avg karma 1.34 2017-08-15 02:52:18+00:00 | [–] similar comments

That's a pretty diengenuous strawman.

A more accurate comparison would be that you put up an advertisement on a billboard along a busy street and then decided to tell people who passed by that they weren't allowed to take a picture of it.

And to continue with this absurdity you feel entitled to enforce who can or can not look at your billboard because despite it being publicly viewable its your advertisement on the billboard.

sbarre | karma 10170 | avg karma 4.57 2017-08-15 19:17:16+00:00 | [–] similar comments

That's also not accurate.

There is no "public space" on the Internet.. There's no un-owned territory or resource that is free to use or metaphorically "stand around" in to take those pictures from.

You are consuming privately-owned resources in all your online activities, and as such some will argue that they can decide to limit your consumption of those resources at their own discretion.

Again, I am not siding with either party here, just trying to dispel this notion that "public space" - in the way we understand public space to exist in the physical world - exists on the internet.

In your example, no one is controlling your right to take photos or stand around and look in any direction you choose.

When you use the Internet, a private entity is allowing you to transit through their network and access sites, a different entity is allowing you to access and receive their content, etc..

LinkedIn owns the server you are accessing when you (or others) go to their site, and they are spending resources servicing those requests, and - they claim - can decide how and when they choose to do that..

laumars | karma 12945 | avg karma 2.49 2017-08-14 22:38:55 | [–] similar comments

It really doesn't take much effort to detect the majority of scrapers. Usually you do so by monitoring patterns of any given IP.

Is each request a profile page incremented (/users/1, /users/2, etc)

or dozens of requests a minute (faster than a typical user would read)?

Is static content (particularly images and CSS) being downloaded too or just the HTML content?

Sometimes the referrer HTTP header can give clues too - though you have to be careful there as that's as unreliable as the user agent header.

However if you're really paranoid about scrapers you can also throw in some honeypots. eg a fake user (/users/13) which is a user account that doesn't exist so that page wouldn't have any links from within your site. ie you only reach it if you're incrementing through the user IDs. Or perhaps a link within your HTML which doesn't render so it's only reachable via automated scripts that don't check what links are rendered inside the display view. Anyone that gets ensnared in your honeypot could then be put on a temporary IP blacklist. Though the danger of doing this is you accidentally blacklist good crawlers if you're not careful about setting appropriate robots rules.

CaptSpify | karma 5342 | avg karma 2.15 2017-08-14 22:48:38+00:00 | [–] similar comments

I've kind of always thought that we shouldn't be using UA strings. Just give the requester the data that they requested according to the current open standards. If they choose not to render it correctly, then thats their problem.

Yes, I realize that it's not that simple, but I think browsers would have tried much harder to adhere to standards if we had done it that way.

dahart | karma 17200 | avg karma 2.75 2017-08-14 23:00:39+00:00 | [–] similar comments

> Being a programmer not a lawyer, I like the idea of more rights for scrapers.

What rights should scrapers have that they don't right now? Keep in mind that a lot of the scraping going on is just some other private company abusing access and hoping to gather and use the information for their own private profit. How many companies are scraping StackOverflow for example and doing nothing but attempting to copy it and draw traffic to their own site? I can't stand copycat sites, they fill my search results with junk. I would assume the majority of scraping that is currently happening is not doing the public any good.

> I don't want to see the internet partitioned away and owned by a few companies, especially when that information is often called a "public profile".

This sounds like you're suggesting that LinkedIn or Facebook calling your profile a 'public profile' means that the law should treat it as a public service due to use of the word 'public', is that what you mean? The word public may be overloaded here. I can see why tax funded projects should be publicly accessible, but I have a hard time seeing why private companies should be compelled to provide access to anything at their own expense.

ocdtrekkie | karma 24767 | avg karma 2.6 2017-08-14 23:06:14+00:00 | [–] similar comments

Consider that should anything ever happen to the sites they scrape, suddenly they become super valuable to the rest of us. Decentralized information is not a bad thing. And arguably, if monetizing those scraped sites pay for them duplicating the data to additional places, I think that's probably fine.

Arguably, if they do a better job getting that information in results to people who need it in search, they may be performing a service there as well. (A lot of decently informative sites have absolutely awful search/visibility.)

dahart | karma 17200 | avg karma 2.75 2017-08-14 23:41:40 | [–] similar comments

> Consider that should anything ever happen to the sites they scrape, suddenly they become super valuable to the rest of us.

That may well be true, but that value doesn't mean anyone should just be able to take that value from the company that put up the effort and investment to collect the data, and turn around an use it for their own profit. Nor does it mean that a company shouldn't be able to serve the data to whomever it wants and/or restrict access from whomever it wants. Value to the consumer is still not a reason to compel private companies to offer public services. It would be valuable to both of us if Google gave us free money, but no court is going to compel them to do so just because of the potential value to you and me.

It seems bad, btw, if we choose to rely on private companies to keep the only backups of our personal data. If a site going down has a negative effect on my life, and takes down data with it that I need, it might be an indication that I shouldn't have kept my data there.

Also true that decentralized information is not a bad thing, as a generic ideal or a data backup plan. But for a business, decentralization in this context means loss of profit, as well as possibly theft, cheating, and copyright violation.

Also, the increased value that comes from a company folding will be used against you by these private, for profit scrapers. They can and will hold their copy ransom for more money, if possible.

> Arguably, if they do a better job getting that information in results to people who need it in search, they may be performing a service there as well. (A lot of decently informative sites have absolutely awful search/visibility.)

What is the argument in favor of this being legal? It is currently not legal, and the law currently does not give any credit for 'doing it better'. Why should it?

TheCoelacanth | karma 7988 | avg karma 1.71 2017-08-15 01:44:57 | [–] similar comments

It seems like a really dangerous precedent to interpret "unauthorized access" in CFAA to mean accessing a computer that is made available publicly without any form of access control when the owner doesn't want you to access it instead of meaning that you subverted some form of access control to access the computer.

ocdtrekkie | karma 24767 | avg karma 2.6 2017-08-15 07:09:14 | [–] similar comments

I'd argue if the data is being made publicly available (particularly in the case of LinkedIn where we are choosing to make our own data publicly available), it is being made publicly available, and anyone should be able to take and use that data. Our data shouldn't be what a corporation's profit (or loss of profit) is based on to begin with.

dahart | karma 17200 | avg karma 2.75 2017-08-15 14:46:17+00:00 | [–] similar comments

> if the data is being made publicly available ... anyone should be able to take and use that data

It isn't being made publicly available in that sense. LinkedIn only offers the data to site visitors (unregistered users) under the guise of a license.

> we are choosing to make our own data publicly available

This isn't true. Putting data on LinkedIn is not making it publicly available, it's sharing a copy of your data with LinkedIn, and allowing them to do whatever they want with it. Those are the terms you agree to when you register.

> Our data shouldn't be what a corporation's profit (or loss of profit) is based on to begin with.

I agree, in an ideal world, but LinkedIn does profit on your data (as do Facebook, Google, Microsoft, etc.). And we are willingly sharing our data with them and allowing this to happen. There are all kinds of crappy trends with data and privacy happening, and lots of people raising red flags. Your choice is to not use those services. If you don't want LinkedIn to use your data for their profit, then don't share your data with LinkedIn. If you share your data with LinkedIn, then LinkedIn now has the right to use your data to their own advantage.

ipnon | karma 6748 | avg karma 4.23 2017-08-14 23:37:48+00:00 | [–] similar comments

This case is much more significant than that. The ruling will determine whether or not breaching the terms and agreements of a website constitutes a crime under the Computer Fraud and Abuse Act (CFAA). LinkedIn is arguing that since scraping is not defined by the company itself to be a permitted use, a federal felony is being committed by anyone using a script to access its website. The language used in the CFAA is "unauthorized access" and this case will set a legal precedent for the meaning of that term.

qarioz | karma 162 | avg karma 3.12 2017-08-14 23:59:53 | [–] similar comments

So some company can sue me if I use `wget somecompany.com` if LinkedIn win?

razwall | karma 213 | avg karma 5.33 2017-08-15 02:29:45 | [–] similar comments

We can hope that this case will set a legal precedent, but it may, like the Craigslist case, end in a settlement (or some other disposition) before that point is reached.

harrygeez | karma 638 | avg karma 2.06 2017-08-15 01:36:40 | [–] similar comments

The way I see it, if you want to make a 'public' profile, host it yourself, create your own website. LinkedIn has spent a lot of resources to create a service/network that helps to connect you with recruiters (or the opposite), whose information may not be easily available to you or can be found easily in the 'public' Internet. If you think from a product owner's standpoint, why should I let just anyone scrape the content collection that I built with so much effort?

RHSman2 | karma 339 | avg karma 0.66 2017-08-15 05:30:56 | [–] similar comments

Agree. However, I think there is a difference between harvesting a database's contents and observing trends, data and creating algorithms based around that observed data.

jlgaddis | karma 11467 | avg karma 2.4 2017-08-14 21:39:51 | [–] similar comments

RTFA before commenting, folks. Questions and misunderstandings in this thread that are easily fixed if you just RTFA.

opaque | karma 554 | avg karma 6.52 2017-08-14 21:46:30+00:00 | [–] similar comments

Does anyone know how they do this scraping from a technical standpoint. The articles allude to it being the same as data Google/Bing spiders, which can clearly access more data that average internet IP for making their result summaries. I had assumed big sites whitelisted specific crawler IP ranges or User-Agents for the search giants. Do they somehow spoof this?

revelation | karma 11052 | avg karma 3.12 2017-08-14 22:27:50 | [–] similar comments

I don't think they do any such thing, if anything they are rotating IPs/user agents to avoid being limited or blocked.

Google requires sites to send the crawler the same content as someone clicking a link on a Google results page would see, so even if some sites get creative covering it up with blurred boxes and similar dark patterns, the data is there in the markup.

opaque | karma 554 | avg karma 6.52 2017-08-15 08:55:28+00:00 | [–] similar comments

I haven't checked the markup, but if you try and hit a linkedin profile page you just get forwarded to a login page. Perhaps if you don't follow the forwarding?

Not sure how this complies with google's requirements, I suspect if you're big enough you get a custom arrangement. However, that doesn't explain how hiQ are getting the data.

tryingagainbro | karma 460 | avg karma 1.06 2017-08-14 21:56:43 | [–] similar comments

Fixed the headline: U.S. judge says LinkedIn cannot block startup from public profile data; the judge will personally pay for the gazillion servers and man hours needed now that scrappers cannot be blocked.

PatrickAuld | karma 90 | avg karma 7.5 2017-08-14 21:56:43 | [–] similar comments

> “We will continue to fight to protect our members’ ability to control the information they make available on LinkedIn.”

LinkedIn has full control over this, it's their site. What they are fighting for is the ability to choose who gets public access to various pieces of information; which its member do not get control over.

sabujp | karma 691 | avg karma 0.89 2017-08-14 23:24:58 | [–] similar comments

This is still very confusing, why didn't they completely block it from scraping without a login?

gizmodo59 | karma 720 | avg karma 3.56 2017-08-14 23:48:36 | [–] similar comments

Because they won't get the traffic from search engines. Searching for a full name very often leads to a LinkedIn page in top 10 results.

fav_collector | karma 66 | avg karma 1.25 2017-08-14 22:08:18+00:00 | [–] similar comments

Does this ruling include regular anti-scraping defenses that might stop HiQ, but doesn't specifically target them?

polote | karma 4682 | avg karma 4.51 2017-08-14 22:31:58+00:00 | [–] similar comments

So they want to forbid a startup to scrap the personal data of their users as if Linkedin was the only company allowed to have access to this data.

I mean it is completely crazy, it is not LinkedIn data it is OUR data

lightbyte | karma 1727 | avg karma 3.73 2017-08-15 12:58:14+00:00 | [–] similar comments

Are you paying the hosting costs for LinkedIn's site and database? Their users gave LinkedIn the data, it's now theirs.

JonathanAgosto | karma 14 | avg karma 2.33 2017-08-15 14:20:40 | [–] similar comments

That's the equivalent of sending your resume to a company for a job opening and the company saying, "since this resume is now on our control, we own the information in it."

iamleppert | karma 5686 | avg karma 4.75 2017-08-14 22:44:04+00:00 | [–] similar comments

I fully support this decision. If you're offering a service that is public, with the intent to your users that such information will be available publicly, you cannot then police what users of that data you consider to be "public" because it serves your business interest.

LinkedIn, of course, wants to get all the benefit of the public Internet with providing as little as they can. This, coming from someone who used to work at LinkedIn.

These companies have built their fortunes on the public Internet and now that they are successful they seek to not pay homage to the platform that give them their success. It's very clearly anti-competitive, and bad for users. LinkedIn should be forced to compete based upon the veracity and differentiation of their service, not because they have their users' public data held hostage from competitors.

anandsuresh | karma 361 | avg karma 7.08 2017-08-14 23:11:55+00:00 | [–] similar comments

This is a tricky issue that has more to do with user psychology than technology. While the data is public, most users do not understand the persistence characteristics of data, especially in the presence of 3rd parties.

In a world where there are no (persistent) copies made by third-parties, the user still is in control of the visibility of their data by updating their profile directly on LinkedIn to show/hide pieces as they see fit. With a 3rd-party in the picture, updates to user-data may or may not be respected by the 3rd party, leading to poor user-experience.

Quoting the article, "HiQ Labs uses the LinkedIn data to build algorithms capable of predicting employee behaviors, such as when they might quit."

Based on that one statement alone, as an employee, I would be uncomfortable with the use of my data to supply my employer with my future plans before I choose to disclose it myself. That choice is mine, and mine alone; not something to be monetized simply because the option exists. And while I have no control over the sharing of data, should something like this happen to me, I'd be more inclined to stop using LinkedIn, which in-turn affects LinkedIn's ability to do business.

redial | karma 1984 | avg karma 3.88 2017-08-14 23:31:00+00:00 | [–] similar comments

Then don't make your profile public.

You can't expect someone to forget you once had a bad haircut just because you now got a really cool one.

chii | karma 16512 | avg karma 1.96 2017-08-15 00:40:42 | [–] similar comments

actually, that happens a lot, at least before the advent of the internet. Physical photos are not always available, and people's memories fade. This is why a lot of people still think of information in the old way - as time goes by, it might fade.

ubernostrum | karma 12994 | avg karma 2.82 2017-08-15 01:28:45+00:00 | [–] similar comments

The problem is you're basically saying to re-train every person who uses the internet to behave in a way completely different from what they're used to. If you think you can do that, by all means try.

But don't be surprised when you find out that your expectations and their expectations were different, and you're the one they blame, and they outnumber you by a lot.

studentrob | karma 2602 | avg karma 2.0 2017-08-15 03:41:11+00:00 | [–] similar comments

> The problem is you're basically saying to re-train every person who uses the internet to behave in a way completely different from what they're used to.

When did we ever teach people that you can control where your data ends up on the internet?

Aren't we trending towards teaching people to not even share data on "closed" services like Facebook and Gmail precisely because they are a single source for a lot of data to be misused by the company, or hacked by a malicious actor?

Regarding data that is accessible without "friending" someone or logging in as the user themselves (e.g. Gmail), I hope people already realize that this data can easily be re-used.

> But don't be surprised when you find out that your expectations and their expectations were different, and you're the one they blame, and they outnumber you by a lot.

If the majority of people think that they can share nude photos of themselves on their own blog or twitter and that this won't be re-used elsewhere, well... I must be living in a different reality, or I misunderstand your point.

azernik | karma 12752 | avg karma 3.49 2017-08-15 04:12:36+00:00 | [–] similar comments

We taught people that they could control where their data ends up before the internet, because that was an accurate model of reality. I think this is a problem that we'll grow out of as more and more people grow up in the shadow of omnipresent social media.

sowbug | karma 5945 | avg karma 3.41 2017-08-15 04:59:53+00:00 | [–] similar comments

Junk mail, telemarketing, revenge porn, and gossip predate the internet. Different media, same lessons.

bryanrasmussen | karma 35788 | avg karma 2.92 2017-08-15 11:06:22+00:00 | [–] similar comments

indeed a way completely different than anyone in the history of humanity has ever behaved.

paulie_a | karma 1667 | avg karma 0.68 2017-08-15 05:19:51+00:00 | [–] similar comments

While that is true, you can't be dismissive that technological advances allow for more sophisticated "memories"

p49k | karma 2577 | avg karma 4.54 2017-08-15 12:16:03 | [–] similar comments

This argument would hold more water if these services didn't constantly rely on dark patterns to trick people into making things public that they otherwise would have preferred to be private.

syshum | karma 2960 | avg karma 0.71 2017-08-14 23:31:14+00:00 | [–] similar comments

One solution to both HIq labs and linked in is to give users, not these companies, some kind of ownership over their data.

Instead of having information about you be owned by which ever corporation collects it, have it at all times be owned by you.

While there are some clear problems with this approach, something needs to be done about companies building databases of ruin where every moment everyone lives from the day they are born until the day they die are cataloged into database for review either by society in general or by algorithms looking to make predictions that impact an individuals future.

I should have some level of control over my Personal information, today even if you actively suppress the 1st party info you put out in the world the number of 3rd parties adding to your profile dwarfs any data you personally put out there, from credit reports, to government databases to credit card companies to soon your web browsing history sold by the ISP's

It is ,IMO, out of control

nine_k | karma 29426 | avg karma 2.95 2017-08-14 23:39:03+00:00 | [–] similar comments

Data you own is not public, by definition. This needs to be made abundantly clear. Something like "all rights reserved" on images.

EGreg | karma 7296 | avg karma 0.72 2017-08-14 23:50:41+00:00 | [–] similar comments

As usual this is the "copyright" debate.

So is your public profile copyrighted by you?

The actual server is controlled by you or gives you a way to take the data down. But by then it could have been republished elsewhere!

avip | karma 6656 | avg karma 3.68 2017-08-15 00:12:35+00:00 | [–] similar comments

You do "own" the data. It just so happens that by registering to LI, you've granted A worldwide, transferable and sublicensable right to use, copy, modify, distribute, publish, and process, information and content that you provide through our Services, without any further consent, notice and/or compensation to you or others

ThrustVectoring | karma 9303 | avg karma 3.4 2017-08-15 01:36:53+00:00 | [–] similar comments

To whom have you granted the license to? I could see a legal argument that you've granted LinkedIn such a license, along with the right for LinkedIn to sublicense and transfer it, but still retain the right to press claims against third parties that scrape LinkedIn and use the information.

I don't think you've granted LinkedIn the right to sue to enforce your claims against third parties, so you'd have to sue directly.

rmc | karma 15660 | avg karma 2.36 2017-08-15 08:02:50 | [–] similar comments

The EU data protection law is sorta like this in theory. Users can't give up some rights.

cvsh | karma 535 | avg karma 4.21 2017-08-15 01:52:06+00:00 | [–] similar comments

>Based on that one statement alone, as an employee, I would be uncomfortable with the use of my data to supply my employer with my future plans before I choose to disclose it myself. That choice is mine, and mine alone; not something to be monetized simply because the option exists.

What choice? The only choice you have is whether to post public info or not. You have no choice over what others do with it. You can't police what others do with information that you freely publicize.

anandsuresh | karma 361 | avg karma 7.08 2017-08-15 06:20:49+00:00 | [–] similar comments

The issue isn't that the data is public. The issue is the extraction of underlying behavior based on the spatial and temporal characteristics of the data being posted. E.g. the fact that I updated my profile on LinkedIn is not in question. But the underlying behavior is that the average user normally updates their profile around the time that they start a job-search. While this isn't true for every user, the use of an algorithm may turn up false positives, which may have unintended results, especially when this data is presented to your manager.

Secondly, while the updated profile is public knowledge, the temporal characteristics of the update isn't a feature that is directly published by LinkedIn. Call it a product feature; it is tailored to present the qualifications of a user, not to advertise the fact that they may be looking for a job. While one can argue that update is public knowledge, and must therefore be available for data-mining purposes, there is a subtle, but potentially dangerous leak of information here that is open to interpretation.

LinkedIn's position, that this leak may be potentially harmful to its users, and by extension, its core business, is therefore a fair point.

jaredklewis | karma 3143 | avg karma 4.18 2017-08-15 05:15:44+00:00 | [–] similar comments

Though I disagree, I think basically understand your argument regarding why people should have to respect the wishes of a party when dealing with information that the party made public.

What I cannot understand is how such a system could reasonably be enforced. Let's say John Doe posts his resume on a job board. If I print out his resume, but he later updates it, am I now somehow in the wrong for retaining the old copy?

I am also a little puzzled by the notion that "persistence" is a new phenomenon. Of course there have been paper records and such for quite some time, but I'll put that aside for a moment. When I was younger, I was often cautioned to think carefully before acting, as a reputation decades in the making could be permanently ruined in just minutes. It seems to me that when it comes to mistakes and "bad" deeds, society's collective memory has always been rock solid.

Rather the persistence, I think the new factor is that things are less regional than before. You can't just pick up and move to a new town, because they basically have the same Internet everywhere.

anandsuresh | karma 361 | avg karma 7.08 2017-08-15 07:11:56 | [–] similar comments

>What I cannot understand is how such a system could reasonably be enforced.

That would be a 26 billion dollar question, and one I would very much love to solve one day! :)

I believe that your example is fairly simplistic to capture the crux of the issue here. The metadata associated with posts often contain features that are quite revealing, but not necessarily the kind of data that the source would wish to be revealed. E.g. I have noticed multiple times that the number of cold emails I have received from recruiters is higher immediately after I update my LinkedIn profile, leading me to believe that the last-update timestamp is a feature that the LinkedIn search engine may be relying on to rank results. While my evidence is purely empirical, it isn't a stretch to imagine that it would be a reasonable thing to do, given that most users normally update their profiles when they are about to start searching for new opportunities.

On the one hand, this is a an excellent product that allows recruiters to reach targets that exhibit behavior associated with active job-seekers, resulting in better connections. It results in a win-win situation where the recruiter gets a pretty good return on his/her investment, and the target receives the attention/information they were looking for with their update. False positives in this situation result in a few unsolicited emails/unwanted attention.

On the flip side, this information can be repackaged to present a manager with a graph plotting the probability of an employee quitting. While this is a perfectly good product, no employee would every use a service that might reveal their future plans ahead of a time of their own choosing. Furthermore, false positives here can have a significant impact and LinkedIn may not remain in business for long if word of this product gets out.

joncrocks | karma 1066 | avg karma 2.35 2017-08-15 09:30:41+00:00 | [–] similar comments

With regard to persistence, I don't think you can argue that the scale of what can now practically be stored hasn't changed significantly.

Given that a larger amount of information can now be stored easily, the economics of what is stored is different, and one would expect this would lead to the storage of more information over time. If you can store in a single HD what would have previously taken a large room, for a tiny fraction of the cost, the bar for storing the information vs. discarding it is much much lower.

I also think it's not only storage, but searching of large datasets that's another reason that information changing over time. Again, if finding information in a large blob of data is easier and as discussed above, is much cheaper, then this will also lead to people storing information for later retrieval. As you say, the fact that all this storage is not networked together means that this information is now easily retrievable from anywhere, only intensifying the impact.

rickpmg | karma 66 | avg karma 2.0 2017-08-14 23:16:22+00:00 | [–] similar comments

I have no love for linkedin, but not sure of your position.

They collected the data, host it, etc, and incur costs for doing so. Just because they allow the public to access it, doesn't mean the public should have a right to re-use it.

People argue that the data is public. I say that's not the issue. While the data itself might be available elsewhere, it is raiding the _collection_ of it that is being argued, not that 'public' data is 'private'.

The _value_ that LinkedIn adds is that they've built the structure to collect and maintain the data. They are _not_ asking the court to prohibit anyone from collecting the same data on their own, at their own expense. If someone wants to start a rival LinkedIn, they are free to do so.

redial | karma 1984 | avg karma 3.88 2017-08-14 23:33:20+00:00 | [–] similar comments

> Just because they allow the public to access it, doesn't mean the public should have a right to re-use it.

That is exactly what public means. Do not make it public if you don't want 'the public' to use it.

EGreg | karma 7296 | avg karma 0.72 2017-08-14 23:51:24+00:00 | [–] similar comments

Let me replace the word "re-use" with "re-publish". Does your analysis change at all?

redial | karma 1984 | avg karma 3.88 2017-08-15 00:07:34 | [–] similar comments

No.

Public means everybody can do whatever they want with it, no exceptions (except, as with all things, by law). If you want to restrict the information, then do it, but don't make it public and then when a competitor uses it claim it wasn't public 'for them'.

dahart | karma 17200 | avg karma 2.75 2017-08-15 00:16:37+00:00 | [–] similar comments

Linked in never made it "public". Use of their site is and always has been licensed. https://www.linkedin.com/legal/user-agreement

redial | karma 1984 | avg karma 3.88 2017-08-15 00:30:26+00:00 | [–] similar comments

The user agreement does not cover the "public" parts of Linked In, like for example, the user agreement. If I want to copy and republish the user agreement I can, despite of what it might say.

If the startup were republishing private information from Linked In I would agree with you.

dahart | karma 17200 | avg karma 2.75 2017-08-15 00:40:28+00:00 | [–] similar comments

> The user agreement does not cover the "public" parts of Linked In,

I beg to differ. Their EULA covers "accessing or using" their site in any way shape or form, and defines the term "visitor" for what you're calling "public".

...

You agree that by clicking “Join Now”, “Join LinkedIn”, “Sign Up” or similar, registering, accessing or using our services (described below), you are agreeing to enter into a legally binding contract with LinkedIn (even if you are using our Services on behalf of a company). If you do not agree to this contract (“Contract” or “User Agreement”), do not click “Join Now” (or similar) and do not access or otherwise use any of our Services.

...

When you register and join the LinkedIn Service, you become a Member. If you have chosen not to register for our Services, you may access certain features as a visitor.

redial | karma 1984 | avg karma 3.88 2017-08-15 01:11:23+00:00 | [–] similar comments

I do not agree with the EULA. I definitely do not agree with the EULA just by reading it. I most definitely do not agree with the EULA just by virtue of it existing and being linked to on some corner of their site. I do not agree to any terms just by visiting a webpage. I am not bound by anything other than the actual law and the contracts I have willingly entered into in writing or the digital equivalent.

If the information is restricted, then restrict it. Do not make it publicly available then claim a webpage as the ruling contract of that information when it is used in a manner you do not agree with.

dahart | karma 17200 | avg karma 2.75 2017-08-15 01:49:42 | [–] similar comments

According to their view, you do agree to the contract by using their services, which includes visiting their web pages. If you don't agree, then don't use the services and don't visit their site. Or do, and argue it in court, but it's pointless to tell me you don't agree, the contract exists.

They're not restricting access to the information. HiQ is scraping their site using bots, and LinkedIn doesn't like it. This isn't a debate about anything being publicly available or not, this is a business fight between two private companies.

redial | karma 1984 | avg karma 3.88 2017-08-15 02:13:06 | [–] similar comments

Well then, since what you are saying is that a contract only one part agreed to is a valid contract,

By reading (or not) this comment, you (“the reader") concede all the points of this discussion. The reader also agrees that the arguments presented by HackerNews user “redial” ("me", "we", "us") are correct even in case of conflict with his or her own previously stated positions, and that he/she will amend all of his/her previous comments to reflect this legally binding agreement.

dahart | karma 17200 | avg karma 2.75 2017-08-15 02:26:00 | [–] similar comments

+1 for the lols. You seem to be arguing against me for the existence of EULAs. Maybe you're not aware that these have been around and their validity has been debated for decades? Lots of people are super bugged by them, just like you are. I think's it's fairly lame too. I didn't write the EULA, and I don't care if it's a valid contract. But no matter what you say to me, no matter how much sarcasm you use, the fact is that LinkedIn's EULA says that by visiting their site, you are agreeing to their contract.

The real point I was making is that LinkedIn is establishing that they are not offering a public service. It doesn't matter whether you can be bound by their contract, the EULA is more about covering their own asses when they do things like refuse service to HiQ. The wrote the rules so that it's clear what things you can do to get banned. Regardless, they have the right to ban IPs or specific bots or whoever they want, because even though they let anyone access the site, that doesn't mean they have to let everyone access the site always. Like it or not, them's the facts.

ryanwaggoner | karma 20530 | avg karma 4.73 2017-08-15 03:06:38+00:00 | [–] similar comments

the fact is that LinkedIn's EULA says that by visiting their site, you are agreeing to their contract

So what? The whole point here is that they can say and think whatever they want, but it doesn't make any difference if the law disagrees.

pm24601 | karma 1813 | avg karma 1.81 2017-08-15 02:35:00 | [–] similar comments

> According to their view,

which the judge didn't agree with. So right now your argument is counterfactual and pointless.

dahart | karma 17200 | avg karma 2.75 2017-08-15 02:52:26+00:00 | [–] similar comments

The judge did not rule on the validity of their EULA. It was an injunction.

ekianjo | karma 34827 | avg karma 2.44 2017-08-15 01:14:41+00:00 | [–] similar comments

you dont need.to sign.on to view public profiles so you dOnt enter any agreement with linkedin as a visitor.

dahart | karma 17200 | avg karma 2.75 2017-08-15 01:51:15+00:00 | [–] similar comments

You can choose to see it that way if you want. LinkedIn's EULA says otherwise. I have no opinion on whether LinkedIn's EULA is enforceable or legal, I'm only sharing the facts, and the facts are that according to LinkedIn's EULA, visitors do fall under the agreement.

ryanwaggoner | karma 20530 | avg karma 4.73 2017-08-15 03:08:42 | [–] similar comments

You seem really hung up on what their EULA says, and I'm not clear on why. The question at hand is whether their EULA applies, so whatever is in it is 100% irrelevant to that question, right?

TheCoelacanth | karma 7988 | avg karma 1.71 2017-08-15 01:14:46+00:00 | [–] similar comments

Is there any precedent for an EULA like that being enforced? Typically for a contract to be valid, acceptance has to be actively communicated. You can't be bound by a contract simply by someone saying that you have accepted it if you do something that you might have done normally.

dahart | karma 17200 | avg karma 2.75 2017-08-15 01:46:02 | [–] similar comments

> Is there any precedent for an EULA like that being enforced?

I don't know, I'm not a lawyer, but Wikipedia says "sometimes".

https://en.m.wikipedia.org/wiki/End-user_license_agreement#E...

> Typically for a contract to be valid, acceptance has to be actively communicated. You can't be bound by a contract simply by someone saying that you have accepted

Again, not a lawyer, but I imagine that use of a service could legally constitute your active end of the communication. You're right, you can't be bound just because someone says, but when you use a service you've gone one step past.

Honestly, I think the EULA is more of a CYA for them than a contract, in practice. But it does establish the potential legality for two things: 1- that this is a licensed service, and 2- that they can refuse service to anyone they want for reasons of business interest.

staticautomatic | karma 3422 | avg karma 2.18 2017-08-15 15:32:26+00:00 | [–] similar comments

I just had a conversation with my own company's lawyers about this recently. Their assessment was that the spectrum from something like what LinkedIn is doing to something like a notarized paper document with a wet signature is a trade-off between ease/simplicity and enforceability.

These kinds of agreement ostensibly are enforceable, but harder to enforce.

shakna | karma 12286 | avg karma 3.3 2017-08-15 04:28:39+00:00 | [–] similar comments

For a public license to be valid, you need to be able to view the terms, to agree to them.

But the EULA says that simply by accessing the site, you agree to its terms.

To view the EULA, you must view the site.

That's just one of many problems wrong with assuming the EULA is binding.

The ruling in this case, has said it isn't, instead because some people can index and scrape (search engines), but others (startups for example) can't. Which is an anti-trust issue.

dahart | karma 17200 | avg karma 2.75 2017-08-15 15:05:33+00:00 | [–] similar comments

> The ruling in this case, has said it isn't

No it didn't. This injunction has temporarily prevented LinkedIn from blocking HiQ, and only HiQ, while the case is argued. The court might rule that LinkedIn can't block anyone, or they might rule that HiQ is not entitled to scrape LinkedIn's data.

> Which is an anti-trust issue.

HiQ claimed it's anti-trust using inflammatory language in their PR statement. I disagree with that assessment. LinkedIn is not preventing HiQ from collecting their own copy of the data, in any way, shape or form. HiQ is claiming they should be able to take LinkedIn's copy because the data is "public" data. Even if that's true, HiQ always has the option to get the data from the same source that LinkedIn did.

> To view the EULA, you must view the site. That's just one of the many problems wrong with assuming the EULA is binding.

Absolutely right, EULAs have all kinds of issues. In practice, the issue of having to access the site to view the license isn't a problem. You can choose after reading the EULA to not agree, and you can choose to not access any other data on LinkedIn.

But there is no reason to assume the EULA is not binding because there are no other legal documents that cover your interaction with LinkedIn, aside from any state and federal laws that might override parts of the EULA.

This is mostly irrelevant to the point I was making though, it doesn't matter if the EULA is binding. It's purpose there is to establish that LinkedIn is not providing a public service. It's communicating that there is no expectation of responsibility on the part of LinkedIn, and that doesn't really depend on whether you are specifically bound by the EULA.

It's just like a sign in a store window that says "we reserve the right to refuse service to anyone, at any time, for any reason." You can could say that the sign is not a binding contract, and go into the store naked and yelling and start breaking stuff. When they kick you out, nobody will come to the defense of your right to walk into a store that everyone else is allowed to walk into.

shakna | karma 12286 | avg karma 3.3 2017-08-15 16:26:59+00:00 | [–] similar comments

>> The ruling in this case, has said it isn't

> No it didn't.

True. I should have said, "the ruling in this case, has said it isn't clear if the agreement should be binding".

> HiQ is claiming they should be able to take LinkedIn's copy because the data is "public" data.

Nobody is taking anybody's data. LinkedIn are providing copies of the data to anybody who views the page. You can't take something from somebody else in this context. It is not possible. Copying, and ineffective deleting are the only methods available for transfer.

> In practice, the issue of having to access the site to view the license isn't a problem. You can choose after reading the EULA to not agree, and you can choose to not access any other data on LinkedIn.

It is absolutely a problem. You don't view data. You download a copy.

You are not presented with the agreement upon visiting a public page, you first download the public page, which then links to the agreement.

Thus, when the agreement becomes enforced, you already have in your possession data from before you agreed, which is then governed by rules you were not aware of, and may not become aware of as the agreement doesn't require intervention.

If we have to come up with physical analogies for a problem that is inherently digital:

You walk into a store. The store hands you a CD, that they made just for you, saying its yours.

You then say thankyou, and only then does the store say that there are conditions attached. But you can't give the CD back. You can only agree that you will destroy it at an indeterminate time in the future. And your method of destruction is almost guaranteed to be reversible, but its all you have.

Oh, and you might not have chosen to even walk into the store. You were stumbling around other stores, and a door led you here.

In common law, once possession is established, new conditions on the possessed item are next to impossible to apply, unless the method of possession was itself a crime.

---

> But there is no reason to assume the EULA is not binding because there are no other legal documents that cover your interaction with LinkedIn, aside from any state and federal laws that might override parts of the EULA.

A EULA, as its name suggests, is a license agreement. So far as I'm aware, most nations capable of accessing LinkedIn have a definition of a license agreement. Insofar as I'm aware, they all require a license agreement to at least be:

"A valid agreement between two parties, where both parties have read, understood and accepted responsibilities (or had ample opportunity to do so), pertaining to the use of the licensed item."

Prior knowledge is a requirement. You can't agree to something you haven't had the opportunity to comprehend.

But LinkedIn happily gives you a copy of their data before you are able to access the agreement. (Such as if your first visit was to a public profile page).

There are many laws that may invalidate the EULA.

---

> It's purpose there is to establish that LinkedIn is not providing a public service.

Its purpose is irrelevant if it is not binding.

A store can put a sign up, saying that only customers who buy a product before leaving may enter. But if someone does, the store cannot force the individual to make a purchase, because their policy was in conflict with other systems of rights.

If something is non-binding, and therefore invalid, it cannot be applied as... It has no validity.

If you have a driver's license, but it became invalid for some reason, you would not be permitted to continue driving, until such time as it became valid.

If the ownership of your house became questionable, you would be squatting.

The non-binding status of any agreement that becomes invalid, regardless of intention, is a problem in law, but it isn't a solved one.

If a license is invalid, you are not bound by it.

---

Caveat: I'm no longer a registered lawyer, as of two years ago. I may not be up-to-date on some things, and my main knowledge was in cross-border and Australian crime, specifically in the realm of IT.

dahart | karma 17200 | avg karma 2.75 2017-08-15 16:54:46 | [–] similar comments

> I should have said, "the ruling in this case, has said it isn't clear if the agreement should be binding".

The injunction didn't say that either. The only thing it said is that LinkedIn can't block HiQ for the time being. This is common in lawsuits that both parties be prevented from action until a decision is actually made. The decision has not been made yet.

> Nobody is taking anybody's data.

I think I used a poor verb, or you misunderstood me. I meant that HiQ wants to copy LinkedIn's data for their own business. In some sense that can be viewed as theft, and that is the way LinkedIn sees it. Under that view, the verb "take" is appropriate, but it doesn't mean that the original copy is transferred or destroyed, it just means that HiQ is now in possession of a copy.

> There are many laws that may invalidate the EULA.

True, and I don't claim otherwise. "No court has ruled on the validity of EULAs generally". https://en.wikipedia.org/wiki/End-user_license_agreement#Enf...

> Its purpose is irrelevant if it is not binding.

It's (a EULA's) main purpose is for communicating expectations, which I'm arguing is relevant even if it's not binding. If the EULA says "we can refuse service to you", and then service is refused, then it's not a surprise.

In a legal sense, this could (but is in no way guaranteed to) reduce liability. What I'm suggesting is that even if the contract is not binding or valid, if you break the rules and get banned from a site, the EULA may still provide a defense in court from the site being sued by the person to whom service was refused. The site can say "we posted the rules, this person broke the rules" and the person may not have any legal support in favor of getting the service after they broke the site's arbitrary rules.

shakna | karma 12286 | avg karma 3.3 2017-08-15 17:26:04 | [–] similar comments

> It's (a EULA's) main purpose is for communicating expectations, which I'm arguing is relevant even if it's not binding. If the EULA says "we can refuse service to you", and then service is refused, then it's not a surprise.

Also not a surprise when a judge orders you to restore access because your agreement is invalid.

> The site can say "we posted the rules, this person broke the rules" and the person may not have any legal support in favor of getting the service after they broke the site's arbitrary rules.

Absolutely. Sites are largely free to enforce rules arbitrarily, by modifying their HTTP responses.

However, you are not free to exclude individuals whilst including their competitors.

Google has been under the hammer for that recently, though that is the EUs anti-trust laws. [0][1][2]. Of particular interest to this case, you might find this quote telling:

> we believe that Google's behaviour denies consumers a wider choice of mobile apps and services and stands in the way of innovation by other players, in breach of EU antitrust rules.

LinkedIn are accused of standing in the way of innovation by other players, in this case, hiQ, whilst simultaneously allowing other players to innovate, such as Google. One can copy the data, the other can't.

> It's (a EULA's) main purpose is for communicating expectations, which I'm arguing is relevant even if it's not binding.

I can expect the rain to move upwards, but that's irrelevant to how gravity actually acts. Unrealistic or false expectations are not taken into account with the rule of law.

A police officer might let you off with a warning for speeding, if you hadn't noticed the speed change. However, if it went to court, your false expectation of a different speed is not a mitigating factor.

If LinkedIn was wrong to prevent access in this case, their liability will not be reduced, if precedent is followed. They will still be responsible for the actions they took, in full, as Intel [3], Microsoft [4], Google and Apple [5] before them have been.

If however, LinkedIn are seen by the court as acting correctly, hiQ may be asked to pay legal costs, or counter-sued for damages.

If the EULA is non-binding, then it may as well not exist, because it has no legal relevancy.

[0] http://europa.eu/rapid/press-release_IP-17-1784_en.htm

[1] http://europa.eu/rapid/press-release_IP-16-2532_en.htm

[2] http://europa.eu/rapid/press-release_IP-16-1492_en.htm

[3] https://en.wikipedia.org/wiki/Advanced_Micro_Devices,_Inc._v....

[4] https://en.wikipedia.org/wiki/United_States_v._Microsoft_Cor....

[5] https://en.wikipedia.org/wiki/United_States_v._Apple_Inc.

dahart | karma 17200 | avg karma 2.75 2017-08-15 17:55:16+00:00 | [–] similar comments

> Also not a surprise when a judge orders you to restore access because your agreement is invalid.

That's not what happened here, there has been no ruling on any agreement, and the injunction order that was given only applies to HiQ, only temporarily, and nobody else. It is not a statement on the validity of EULAs or of LinkedIn's EULA, and it is not a statement on whether LinkedIn is being anti-competetive. It is an injunction and nothing else.

shakna | karma 12286 | avg karma 3.3 2017-08-15 18:30:16 | [–] similar comments

> That's not what happened here, there has been no ruling on any agreement, and the injunction order that was given only applies to HiQ, only temporarily, and nobody else.

I didn't say it was.

> It is not a statement on the validity of EULAs or of LinkedIn's EULA, and it is not a statement on whether LinkedIn is being anti-competetive. It is an injunction and nothing else.

An injunction is not given without merit. It has meaning.

Injunctions are regularly denied when the arguments are clearly in one direction or another.

The injunction strongly suggests that the judge finds hiQ's argument, that LinkedIn's public pages are not bound by the EULA, to "not be without merit".

No precedent has been set, but the conversation is definitively in the opening stages.

mariushn | karma 711 | avg karma 1.54 2017-08-17 14:13:28 | [–] similar comments

Dahart, would you please add your email in profile? Would like to followup on your older health-related comments. Thanks!

chongli | karma 21313 | avg karma 3.6 2017-08-15 00:09:31 | [–] similar comments

That only makes sense in the context of copyright. Users' personal information cannot be copyrighted by LinkedIn.

cvsh | karma 535 | avg karma 4.21 2017-08-15 01:55:52 | [–] similar comments

Yes, because "publish" implies breach of copyright.

If I post an essay on LinkedIn, and then someone posts it on their blog, copyright has been breached because that is my original work.

If I post the fact that I worked at Dunkin Donuts from 2007 to 2009 on LinkedIn, and then someone records it and feeds it to an employee quitting predictor algorithm, they've done nothing illegal. Me stating the fact that I was employed at a certain place for a certain amount of time is not me publishing an original work.

rickpmg | karma 66 | avg karma 2.0 2017-08-15 01:09:38+00:00 | [–] similar comments

LinkedIn never said the data was 'public' in the sense that you are using it. You are assuming that just because it can be accessed for free, by anyone with an internet connection, that it is therefore in the public realm. That is incorrect.

jfoster | karma 5234 | avg karma 2.13 2017-08-15 01:37:13+00:00 | [–] similar comments

As an example, is Netflix's collection "private"?

If yes, would it still be if they charged only $0.01 for it?

If yes, would it still be if the price was $0?

It seems silly to me to have the "rules" depend upon the price.

int_19h | karma 21203 | avg karma 1.69 2017-08-15 07:12:18 | [–] similar comments

The rules depend on whether something is copyrightable or not.

Movies are copyrightable.

Compiled catalogs of personal information are not (https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R...).

driverdan | karma 15181 | avg karma 3.36 2017-08-14 23:48:45 | [–] similar comments

> They collected the data

No they didn't. Users input most of their data.

rsj_hn | karma 9643 | avg karma 3.21 2017-08-15 03:29:30 | [–] similar comments

The implication is that the company that serves public data could impose conditions on the use of that data, for example they could:

  1. ban the use of ad blockers when accessing the data
  2. ban users making an offline copy to view later
  3. ban users from disabling auto play or other features
  4. otherwise control what you do with data once you get it, which is *huge*. 
     E.g. what if they want a 1% share of any revenue you get by using the data, etc.

I think this really restricts freedom and has some scary implications for the future of the web.

Of course now, they have a technological option to try to force each of the above, but users also have a technological option to try to outsmart them. But I wouldn't want to give them a legal right to force the above.

Buge | karma 3961 | avg karma 2.44 2017-08-15 06:00:52+00:00 | [–] similar comments

Companies already do all those through technical means. In fact they have the full force of the law behind their efforts because they simply put some DRM on it and now it's illegal to try to circumvent.

I would LOVE it if the courts would remove the legal protections of DRM. It seems so strange that this court has gone so far in the viewer-rights direction, but hasn't bothered taking the baby steps to remove the legal protection of DRM.

Hmm, now I'm hoping LinkedIn implements some DRM so this fight can get truly interesting and maybe make some positive difference.

rsj_hn | karma 9643 | avg karma 3.21 2017-08-15 07:57:26 | [–] similar comments

I'm not sure whether data like this can be copyrighted or is considered a creative work. The creative work would be things like the LinkedIn logo or graphics, and these fall under IP protections that limit what you can do with them even if they are freely available.

iamleppert | karma 5686 | avg karma 4.75 2017-08-15 03:41:58+00:00 | [–] similar comments

As a user, it's in my benefit if a competitor comes along, takes LinkedIn's data that they are freely publishing on the public Internet, and does something useful with it.

The correct analogy would be if someone took a copy of my personal resume that I put online, freely accessible on the Internet and did something useful with it.

Heck, Google does this already by indexing and providing a directory of public content. The fact that LinkedIn 'allows' them to do this is by virtue that it makes business-sense to do so and drives traffic to their site.

The rule should be plain and simple here: if you put user content online and do not make any efforts to restrict it (i.e. no passwords, no logins), call it "public information", you do not have any rights to say who can and cannot access that content, at the minimum. Unless I'm mistaken, you also cannot claim copyright infringement, as the user technically owns that content as well -- you just have a license to publish it (either to a private or public audience).

It should be up to the user --- and in fact their right --- to police their own content online. Personally, I find it offensive that LinkedIn seeks to restrict the distribution of such content that I have published through their service, where the expectation it is public. They are not acting in my interest here, they are very clearly acting in their own selfish interest, which I find odd considering LinkedIn's supposed mission has always been to empower their users to achieve professional success. How exactly are they empowering me by restricting who I have told them can access my public content? And the fact such restrictions are solely decided by LinkedIn with no input of their users -- the ultimate owners here -- is a disgrace and violation of their own mission statement.

This kind of concept is exactly what the Internet was founded on, folks. To say or think otherwise strikes at the heart of the open web and representing yourself as such is an affront against the great platform that has given rise to so many companies and provided so much opportunity in the world for the individual.

This concept is bigger and more powerful than any one company, and deserves to be defended.

hasenj | karma 4763 | avg karma 2.56 2017-08-15 06:09:43+00:00 | [–] similar comments

> As a user, it's in my benefit if a competitor comes along, takes LinkedIn's data that they are freely publishing on the public Internet, and does something useful with it.

In this case, it's not to _your_ benefit. They're going to warn your boss that you will quit soon.

cabalamat | karma 6928 | avg karma 2.63 2017-08-14 23:50:35+00:00 | [–] similar comments

> If you're offering a service that is public, with the intent to your users that such information will be available publicly, you cannot then police what users of that data you consider to be "public" because it serves your business interest.

I wonder if the same ruling would apply to companies using Twitter's data feed? If so, it would be important in breaking open data silos.

avip | karma 6656 | avg karma 3.68 2017-08-14 23:59:24+00:00 | [–] similar comments

Not sure how this view should be interpreted:

  - Websites should not have ToS
  - Websites may have ToS, users should be able to violate it without consequences if they don't like it ( = roughly current state of affairs)
  - Websites may only have ToS derived from some criminal act
  - something else?

TheCoelacanth | karma 7988 | avg karma 1.71 2017-08-15 01:19:04+00:00 | [–] similar comments

How about "Website ToS should only be legally enforceable if users have actively indicated acceptance of them, not simply by saying that you agree to the ToS if you read the website".

ikeboy | karma 14321 | avg karma 2.95 2017-08-15 04:29:39+00:00 | [–] similar comments

The current injunction says LinkidIn can't even refuse to serve pages if they detect a ToS violation.

dahart | karma 17200 | avg karma 2.75 2017-08-15 00:01:26+00:00 | [–] similar comments

> If you're offering a service that is public

LinkedIn is not a public service, LinkedIn is a private, for-profit business. A public service is normally publicly funded. https://en.wikipedia.org/wiki/Public_service

> with the intent to your users that such information will be available publicly, you cannot then police what users of that data you consider to be "public" because it serves your business interest.

LinkedIn licenses their service to both free and paid users, and they can legally and do attempt to police what users can access the information. Whether it's enforceable or not, LinkedIn has a EULA and use of their service, whether you have an account or not, is presumably governed by their license.

They are well within their right to restrict requests being made by spammers and DDOS attacks, for example. How do you tell between legitimate requests and abusive ones, and how could you compel public access without enabling abusive ones?

> LinkedIn, of course, wants to get all the benefit of the public Internet with providing as little as they can.

The internet isn't a public service yet either, at least in the US. I think it should be, but it currently is not. This is conflating the knowledge that everyone with a computer and net access can currently access a LinkedIn server with the idea that everyone must be able to.

smilekzs | karma 1163 | avg karma 2.38 2017-08-15 00:06:29 | [–] similar comments

> > If you're offering a service that is public

> LinkedIn is not a public service

A service that is public(ly accessible through Internet) != Public Service.

dahart | karma 17200 | avg karma 2.75 2017-08-15 00:09:59 | [–] similar comments

That is correct. It's unfortunate, but such is the difficulty of language. "Public service" is a term that has precedence and legal meaning in the US, feel free to check out the article that I linked.

It's totally fine to call a service that is publicly accessible a public service, but it is going to lead to miscommunication if you are suggesting that this public service should have the same kind of legal requirements and regulations that the other kind of "Public Service" already has.

vacri | karma 17701 | avg karma 1.83 2017-08-15 01:06:31 | [–] similar comments

The GP did not call it a "Public service", but a service that was public, and the further context around the comment made it abundantly clear they were talking about access, not government provision.

The very first thing you did in quoting the GP was rearrange those words to suit your hobby-horse; a classic straw-man.

dahart | karma 17200 | avg karma 2.75 2017-08-15 02:12:10 | [–] similar comments

I don't understand what you mean.

The GP said "you cannot then police what users of that data you consider to be 'public' because it serves your business interest."

Yes, they can.

That sentence is explicitly describing the public expectations of a public service in the government sense. LinkedIn can police whatever they want because of their business interest, precisely because they are a private entity and not a public service in the government provision sense. Despite their offering information to unregistered site visitors, they are within their rights to refuse service to anyone at any time for any reason.

I did not rearrange GP's quote, and GP seemed to be conflating public access with government provision, which is why I brought up the distinction.

Are you saying that you agree with GP that LinkedIn should be compelled by law to provide public access to all?

pm24601 | karma 1813 | avg karma 1.81 2017-08-15 02:29:18 | [–] similar comments

> Yes, they can

The judge just said .. no they can't. Until the judge's ruling is overturned. Your statement is incorrect.

And you keep using the phrase "public service" which is not the issue at hand.

A store owner can not dictate who is allowed to read or take pictures of their store window. Effectively the judge was saying that if LinkedIn offers information that does not require a login - LinkedIn can not then tell someone that they can't use the info that is publicly visible.

No mention of "public service" - please stop conflating the two concepts.

dahart | karma 17200 | avg karma 2.75 2017-08-15 02:50:36 | [–] similar comments

> The judge just said .. no they can't. Until the judge's ruling is overturned.

This was an injunction, no ruling has been made.

> Effectively the judge was saying that ...

This is an injunction in LinkedIn vs HiQ, the judge did not share an opinion about site visitors who aren't logged in, nor make any ruling about whether publicly visible data can be restricted or not.

> A store owner can not dictate who is allowed to read or take pictures of their store window.

True, from outside the store. But to make a more complete analogy to this case, a store owner can dictate what you can read or photograph while inside the store. And the store owner can legally block the view to the outside anytime she wants.

> "public service" which is not the issue at hand.

I just explained above, and maybe you reacted quickly without understanding what I wrote. I'm not sure why I'm getting heavy pushback on this, it is both accurate and not controversial.

I reacted to @iamleppert's idea that LinkedIn can't police it's users. The fact is that they can police their users (except for HiQ until the case is over). My interpretation is that he was saying they shouldn't be allowed to police this "public" data. Do you think I misunderstood? I'm not conflating the concepts, I am distinguishing between them. It may be that one is a red herring or that I misunderstood, but on 2nd and 3rd reading it still looks to me like GP is suggesting that LinkedIn should not be allowed to restrict access to some users. If that were true it would turn LinkedIn into a public service, hence the reason why I'm talking about public services.

wolco | karma 4805 | avg karma 1.33 2017-08-15 05:43:50+00:00 | [–] similar comments

If the store owner tells you information you can write it down and share it. They can refuse to answer questions but they cannot take back any public data they made available.

They have a right to ignore your question and put rules around who they will respond to and when.

When they respond with data that cannot be copywritten (a name, address, title of past position,etc) of course someone can reuse those pieces.

icebraining | karma 48925 | avg karma 2.14 2017-08-15 11:43:00+00:00 | [–] similar comments

They can refuse to answer questions

Which is what LinkedIn did, and what HiQ is suing them over.

smokeyj | karma 2838 | avg karma 2.25 2017-08-15 00:24:17 | [–] similar comments

> LinkedIn is not a public service

Boo. OP said it's a service offered to the public. Don't need to move the goal post.

> Whether it's enforceable or not, LinkedIn has a EULA and use of their service, whether you have an account or not, is presumably governed by their license.

Turns out I also have a EULA. And when linkedin responds to my HTTP requests, they opt into my EULA agreement.

It sounds like LinkedIn doesn't want to be accessible from the world wide web. Going offline is always an option.

dahart | karma 17200 | avg karma 2.75 2017-08-15 00:32:21 | [–] similar comments

> Boo. OP said it's a service offered to the public. Don't need to move the goal post.

Nothing moved. OP also said, in the same sentence, "you cannot then police what users of that data you consider to be 'public' because it serves your business interest." Yes, they can. It is clear from context that OP was suggesting that LinkedIn should be held to the legal standards of a "Public Service" in the government entity sense of the term. This is not the case, LinkedIn has no legal requirement to provide anything to the public. Booing me doesn't change that.

> Turns out I also have a EULA. And when linkedin responds to my HTTP requests, they opt into my EULA agreement.

Good luck with that!

smokeyj | karma 2838 | avg karma 2.25 2017-08-15 00:37:38 | [–] similar comments

Just like you can't fart in public and charge bystanders for the scent - you can't broadcast facts into the public and expect people not to recall them. I mean, you can. But good luck with that!

dahart | karma 17200 | avg karma 2.75 2017-08-15 00:45:54 | [–] similar comments

> you can't broadcast facts into the public and expect people not to recall them

That's a straw man, that is not the issue here. LinkedIn is not recalling the data, they asked to stop HiQ from scraping and collating their data and using it for their own business.

This is already in the EULA, so what HiQ is doing may already be breaking the law.

8.2 Don'ts You agree that you will not:

k. Develop, support or use software, devices, scripts, robots, or any other means or processes (including crawlers, browser plugins and add-ons, or any other technology or manual work) to scrape the Services or otherwise copy profiles and other data from the Services;

m. Copy, use, disclose or distribute any information obtained from the Services, whether directly or through third parties (such as search engines), without the consent of LinkedIn;

ae. Use bots or other automated methods to access the Services, add or download contacts, send or redirect messages;

avip | karma 6656 | avg karma 3.68 2017-08-15 00:55:53 | [–] similar comments

(fun fact, not strictly important here) - 8.2.k was added quite recently [1], likely triggered by the HiQ case.

[1] https://blog.linkedin.com/2017/april/10/updates-to-our-terms...

[2] Here https://github.com/tosdr/tosback2/blob/master/crawl/linkedin... is an older version of the ToS, which one can read while enjoying the irony of a crawler focused on scrapping ToS

smokeyj | karma 2838 | avg karma 2.25 2017-08-15 01:03:28+00:00 | [–] similar comments

> LinkedIn is not recalling the data, they asked to stop HiQ from scraping and collating their data and using it for their own business.

It's not LinkedIn's right to tell me how to use facts. Just like I can't make LinkedIn liable for my EULA. It doesn't constitute an actual legal agreement.

That said I think the judge is wrong that LI should remove barriers from accessing the information. Information is speech. To mandate information be suppressed or produced is less than ideal.

dahart | karma 17200 | avg karma 2.75 2017-08-15 02:01:11+00:00 | [–] similar comments

> It's not LinkedIn's right to tell me how to use facts.

Yes, you're right about that. But it is LinkedIn's right to tell you how to use their service. Lots of people ignore what they say, and they might not be able to sue someone who breaks their rules, but they can state the rules.

They said they don't want bots scraping their site, and that's their right. They wrote software to detect specific bots & IPs, and refuse service. That's also their right, in my opinion.

Sites who want to use LinkedIn's database are free to collect their own facts instead. I don't know if this is what you were suggesting, but LinkedIn's refusal to serve HiQ's bots is not suppressing any speech, in my view.

> That said I think the judge is wrong that LI should remove barriers from accessing the information.

We totally agree. What are we arguing about?

DannyBee | karma 28474 | avg karma 5.43 2017-08-15 03:19:38+00:00 | [–] similar comments

"Just like you can't fart in public and charge bystanders for the scent - you can't broadcast facts into the public and expect people not to recall them. I mean, you can. But good luck with that! "

Do you also believe you can take the satellite tv signals beamed at your house and decrypt them? After all, they broadcasted them as widely as they possibly could! If they didn't want you to watch them, they shouldn't have sent it to you!

(This is a great in-theory argument that simply does not mesh well with our law in reality)

icebraining | karma 48925 | avg karma 2.14 2017-08-15 11:44:32+00:00 | [–] similar comments

Personally, I do think it should be legal to decrypt them.

graphitezepp | karma 783 | avg karma 2.78 2017-08-15 13:28:02+00:00 | [–] similar comments

Same. Little did parent commentator know, there are plenty of radical "all information should be free" thinkers in these parts.

wsy | karma 474 | avg karma 1.52 2017-08-15 15:07:31+00:00 | [–] similar comments

If the broadcasted stream is encrypted, and the key is not public, it is obviously not public data.

If the stream is unencrypted, the reasoning applies perfectly.

rayiner | karma 121493 | avg karma 4.24 2017-08-15 00:43:01 | [–] similar comments

> I fully support this decision. If you're offering a service that is public, with the intent to your users that such information will be available publicly, you cannot then police what users of that data you consider to be "public" because it serves your business interest.

Obviously LinkedIn can't control the information itself. But this case isn't about the information in the abstract. It's about an HTTP request to a piece of private property, and how LinkedIn programs that private property to respond to an HTTP request. It's well-accepted that owners of private property can make it available to the general public, with whatever restrictions they please. There is no good reason to treat web servers differently than store fronts. LinkedIn should be able to control who accesses their web servers and how.

vacri | karma 17701 | avg karma 1.83 2017-08-15 01:09:21+00:00 | [–] similar comments

> There is no good reason to treat web servers differently than store fronts. LinkedIn should be able to control who accesses their web servers and how.

These two statements do not agree with each other. An owner of a brick-and-mortar shop can't (legally) stand out the front and bar black people from entering, for example.

KGIII | karma 4317 | avg karma 2.25 2017-08-15 01:28:26+00:00 | [–] similar comments

I believe that store can stop you from coming inside and photographing all the displays.

I don't think they can stop you from photographing through the window.

It's an interesting case.

wolco | karma 4805 | avg karma 1.33 2017-08-15 05:18:51 | [–] similar comments

In this case you go in and request the price and they tell you.

They can't stop you from sharing the price its public now.

KGIII | karma 4317 | avg karma 2.25 2017-08-15 05:31:39 | [–] similar comments

We are just gonna beat this analogy to death, aren't we?

icebraining | karma 48925 | avg karma 2.14 2017-08-15 11:41:05 | [–] similar comments

No, they didn't tell HiQ. In fact, HiQ is suing LinkedIn because LinkedIn refuse to tell them.

rdslw | karma 2050 | avg karma 6.57 2017-08-15 12:51:46+00:00 | [–] similar comments

No, they did tell then stopped on 5th product.

LIN gives this information (product price) for first few questions, then if you ask about 5th price they say: 403, no more for you. WHILE IF at the same moment, different person (or you in proxy-ip-disguise) comes and asks for 5th product price, LIN happily (and publicly) gives this information.

And if the person comes wearing googlebot tshirt, LIN drops to knees and give a ..... full-db-dump-job ;) Bing/Baidu thsirts also fit. Source: www.linkedin.com/robots.txt

HiQ simply says: that's unfair you can't decide who is good and whois bad. Effectively LIN bans any google competitor (documented case).

rayiner | karma 121493 | avg karma 4.24 2017-08-15 01:51:07 | [–] similar comments

No, but they can stand outside and bar pretty much everyone else from entering. Race or national origin discrimination is a very narrow exception to that.

ams6110 | karma 28057 | avg karma 2.52 2017-08-15 02:18:45 | [–] similar comments

No but they can define terms of entry that apply equally to everyone.

E.g. you may enter and look around. You may not take notes, pictures, or otherwise record what is here.

briffle | karma 4531 | avg karma 4.12 2017-08-15 03:58:17 | [–] similar comments

But this case is about (using your analogy) someone standing on the street, photographing your building. You can't tell them they can't do it. If you don't want them knowing your apples are $1.25/lb then you need to remove that sign from your shop window.

rayiner | karma 121493 | avg karma 4.24 2017-08-15 11:40:33 | [–] similar comments

They don't have to apply equally. They just can't discriminate against a protected class. I can put up a "no hispters" sign on my restaurant and enforce it.

BearGoesChirp | karma 958 | avg karma 2.58 2017-08-15 13:26:21 | [–] similar comments

But they can bar lots of people for other reasons. Block short people, people with black hair, people wearing red. It gets a bit murky when they block people for something that is related to a protected class but not directly because of a protected class. For example blocking anyone wearing a cross because they block everyone wearing torture devices from entering.

solomatov | karma 1465 | avg karma 1.81 2017-08-15 01:16:51+00:00 | [–] similar comments

There're limitations to this. For example, you can take a look of other people's property from publicly accessible place.

pdabbadabba | karma 4191 | avg karma 3.87 2017-08-15 01:52:48+00:00 | [–] similar comments

Yes, but they are free to build walls, gates, and windows.

solomatov | karma 1465 | avg karma 1.81 2017-08-15 03:26:27+00:00 | [–] similar comments

Building a wall is akin to limiting the site only to registered users. They don't do this because they want google to index them, but it indexes only publicly available sites.

I.e. that's ok to build a wall, but it's not ok not to build the wall but sue some people among the ones who take a look at the house.

icebraining | karma 48925 | avg karma 2.14 2017-08-15 11:39:58+00:00 | [–] similar comments

LinkedIn is the one being sued, not the one suing.

rayiner | karma 121493 | avg karma 4.24 2017-08-15 01:55:58+00:00 | [–] similar comments

Sure, you can do things that don't constitute trespass. But unlike what's inside a store front, you can't observe the contents of a web server without interacting with it (trespassing).

ryanwaggoner | karma 20530 | avg karma 4.73 2017-08-15 03:11:11+00:00 | [–] similar comments

This seems really annoyingly tied to the technology of the web. If the public, non-authenticated web was built on a broadcast mechanism (like radio) instead of a request-response mechanism like HTTP, then this argument wouldn't apply. Hopefully the court considers whether it actually behaves more like the former than the latter.

int_19h | karma 21203 | avg karma 1.69 2017-08-15 07:15:42 | [–] similar comments

Even with request-response, I don't see how this would be to LinkedIn's benefit. Their server receives the request, and sends the response. If they don't want it to send the response, they can change it accordingly. If they send the response as usual, how could the request be trespassing?

icebraining | karma 48925 | avg karma 2.14 2017-08-15 11:38:10+00:00 | [–] similar comments

But they aren't sending the response as usual. That's why HiQ sued them, and what the court said they must do: "To the extent LinkedIn has already put in place technology to prevent hiQ from accessing these public profiles, it is ordered to remove any such barriers"

rayiner | karma 121493 | avg karma 4.24 2017-08-15 11:34:19+00:00 | [–] similar comments

If the public web was broadcast at a range of radio frequencies the result would indeed be different. But it isn't. In my view, law should be tied to reality.

The web isn't an abstraction, it's a network of privately owned servers responding to requests. This order tells a company that they can't program their servers to look at who is making the request and refuse to respond on that basis.

studentrob | karma 2602 | avg karma 2.0 2017-08-15 03:20:58+00:00 | [–] similar comments

> unlike what's inside a store front, you can't observe the contents of a web server without interacting with it (trespassing).

Then what do you consider the storefront, or public facing data of a website? Just the whois info on the domain?

The general public can't observe the locked contents of a server without hacking it. Hacking is trespassing.

I would modify your analogy to say the store front is public facing, just like some of the LI data is public facing. There is other LI data that is not public facing.

In my opinion, websites have a larger storefront, as well as multiple levels of access to internal data.

mrkgnao | karma 1913 | avg karma 3.07 2017-08-15 03:53:18 | [–] similar comments

Making HTTP requests is the same thing as having rays of light reflect off the storefront, I'd say.

dawidloubser | karma 157 | avg karma 3.57 2017-08-15 08:50:50+00:00 | [–] similar comments

Actually, it's not. In your analogy, the storefront is completely passive and unaffected.

What is actually happening, is that somebody is walking into the store, asks a question about the stock or the price of the products on sale, which the store employee willingly answers.

Then, all of the sudden, the store wishes to control what you do with the answer that was willingly given to you.

This is clearly absurd - and so too is wanting to control what people do with publically-available HTTP data. If it's public, it's public.

I personally do feel that LinkedIn is within their full rights to attempt to detect and restrict content being served to screen-scraping agents, but they must then accept that screen-scraping agents must be allowed to use any means necessary to impersonate a "normal" user browsing the (public) information that they publish.

This can't be a one-sided freedom.

rayiner | karma 121493 | avg karma 4.24 2017-08-15 11:38:41+00:00 | [–] similar comments

No that's not what's happening. What was happening is that the store clerk was noticing that an employee from a competitor was coming in and asking questions about the price, and then refused to answer the questions. The judge ordered LinkedIn to respond to the competitors HTTP requests.

kevin_thibedeau | karma 19088 | avg karma 2.16 2017-08-15 14:03:43+00:00 | [–] similar comments

Where do you draw the line between this and a DoS flood of HTTP requests? At some point a provider has to be able to rate limit requests to maintain service for legitimate users.

rayiner | karma 121493 | avg karma 4.24 2017-08-15 14:47:43+00:00 | [–] similar comments

I don't. I think owners of web servers should be able to selectively choose to respond to requests however they please (so long as they don't violate any e.g. civil rights laws).

wolco | karma 4805 | avg karma 1.33 2017-08-15 05:16:40+00:00 | [–] similar comments

And they do control access.

When you type in a url a request is made and the server responds. Linkedin controls that response and can send back whatever it likes.

McDonalds can control who they sell a burger to but if I want to give my burger to a homeless man outside they shouldn't be allowed to stop me. In this case it's worse the place will tell me the price of a burger but won't allow me to tell the anyone else.

Once the information is public it has entered the public domain

icebraining | karma 48925 | avg karma 2.14 2017-08-15 11:36:19+00:00 | [–] similar comments

Linkedin controls that response and can send back whatever it likes.

That's exactly what this injunction prohitibts them from doing: "To the extent LinkedIn has already put in place technology to prevent hiQ from accessing these public profiles, it is ordered to remove any such barriers"

shakna | karma 12286 | avg karma 3.3 2017-08-15 16:39:05+00:00 | [–] similar comments

Which is why this is an anti-trust case.

LinkedIn have freely provided public data to any competitor but hiQ. If they were preventing any company from taking the data, say by putting it behind a user login with licensing, it would likely not be under consideration.

ebbv | karma 4567 | avg karma 2.34 2017-08-15 02:09:53+00:00 | [–] similar comments

I get your point but there's a really scary implication here that if I run a website I can't just ban any IPs I want that I deem to be abusing my service.

If I spot a bot scraping my data, I should have the right to block it.

cmdlinerambo | karma 31 | avg karma 2.07 2017-08-15 02:23:56 | [–] similar comments

Whenever there is a free market, market participants do what they can to restrict the market. This is the nature of the free market.

hyperdunc | karma 564 | avg karma 1.42 2017-08-15 02:37:23 | [–] similar comments

That's why it's necessary for government to regulate free markets - to keep them as free as reasonably possible. On the surface, this sounds like a contradiction.

Unfortunately it's also a little too easy to wind up with corporatism or oligarchy when government is afforded too much power to regulate.

jitix | karma 782 | avg karma 2.96 2017-08-14 21:30:33 | [–] similar comments

But isn't the data stored on LinkedIn's servers technically owned by LinkedIn? I support this decision in the interest of innovation but hiQ is using data that is physically on LinkedIn's systems and which has been legally acquired by LinkedIn from its users. The term "public data" is a very broad definition and needs to be well defined.

morecoffee | karma 490 | avg karma 4.71 2017-08-14 21:35:18 | [–] similar comments

Is the data still Linked In's once it leaves the server though? Hopefully the court will find that the answer is no.

AndrewKemendo | karma 22331 | avg karma 3.59 2017-08-15 02:37:21+00:00 | [–] similar comments

If you serve that data to a client by a simple HTTP request it's publicly accessible. My guess would be that they could claim copyright over all material published on LinkedIn, which they probably do, however the fair use of that copyright would be the question.

DannyBee | karma 28474 | avg karma 5.43 2017-08-15 03:18:29+00:00 | [–] similar comments

In this case, it's a moot discussion. LinkedIn publicly disclaims ownership of the data.

jacquesm | karma 227883 | avg karma 3.76 2017-08-15 02:30:42+00:00 | [–] similar comments

So, how can this be twisted so crawling the Google SERPS is legal?

gfodor | karma 18859 | avg karma 3.92 2017-08-15 04:30:25+00:00 | [–] similar comments

One of the big dynamics of information I feel is still woefully under-specified is the fact that information aggregation results in power. In that light, you could imagine some framework where certain uses of public information that are inherently not aggregative are somehow legally and ethically distinct from those uses which involve aggregation. I have no idea if this distinction would be relevant for the purpose of a company enforcing use of their content, but I think on the side of data use itself, it certainly is a meaningful distinction.

dheera | karma 9101 | avg karma 1.67 2017-08-15 05:10:51 | [–] similar comments

Every time there is a case about whether or not information is free to use in some way, it doesn't sit well with me.

As of now deep down in my heart I don't really believe in intellectual property. I do believe in respect, and in giving credit where credit is due.

But when all is said and done, I don't believe in copyrighting a number. I don't believe that it is anyone's right to dictate how bits that enter devices I own are used.

How do protect inventions and businesses, you ask? I say:

* If you have a secret you don't want to be distributed, don't share it with anyone you cannot trust. Trained models are a good example.

* Provide a service that only provides a limited amount of information per unit time. The user is free to use the information they obtained however they wish, but even a thousand users wouldn't have a way to copy your whole database in a reasonable amount of time. Google search, for example.

* Alternatively, if you are not sure you trust someone, take collateral from them. (e.g. "It is legal to share this information about me, but then it is also legal for me to share this other information about you [that you probably don't want shared]", or "If you share this information about me, you will be kicked out from the company/platform/etc.", or an Ethereum smart contract that causes you to be fined if someone else demonstrates that they got a piece of information that I shared with you.)

* Build businesses that have stronger value propositions than restricting how information is used. Network effects, physical hardware, good service and support, good use of massive amounts of back-end data that only you have, are all options.

rootlocus | karma 2736 | avg karma 2.61 2017-08-15 11:03:44+00:00 | [–] similar comments

> But when all is said and done, I don't believe in copyrighting a number. I don't believe that it is anyone's right to dictate how bits that enter devices I own are used.

Numbers can and are used to represent anything and everything. This has implications far beyond DRM. For instance: I don't believe you have the right to use my photo to make a defamatory Facebook account in my name.

dheera | karma 9101 | avg karma 1.67 2017-08-17 17:35:52+00:00 | [–] similar comments

In this case, I say the crimes are:

- defaming

- claiming that I am you

and not in copying numbers.

rootlocus | karma 2736 | avg karma 2.61 2017-08-18 15:21:10 | [–] similar comments

If you stick a knife in someone's chest the crime is murder not wielding a knife.

To paraphrase:

> I don't believe that it is anyone's right to dictate how the knives I own are used.

And yet the law still dictates you keep your knives away from my chest. Point being: your freedom of using your bits is dictated by the same laws that dictate your freedom of using knives.

bjd2385 | karma 131 | avg karma 1.51 2017-08-15 05:27:11+00:00 | [–] similar comments

Well put!

a_lifters_life | karma 136 | avg karma 0.34 2017-08-15 11:03:02+00:00 | [–] similar comments

MS sucks what can I say?

PaulHoule | karma 78160 | avg karma 2.48 2017-08-14 22:45:07+00:00 | [–] similar comments

On one hand, LinkedIn is like Twitter, Craigslist and Delicious in that it has sat on a treasure trove of data without helping users mobilize it. (All of the premium services they offer are outright lame; if there was a market for premium services we might seem some good ones.)

On the other hand, privacy is an issue too. LinkedIn lets you download a spreadsheet with the email addresses of all your connections, and if you have a lot of connections you will regularly get e-mail messages from life coaches, "managing directors", software development outsourcers, "SEO experts", and all kind of BS artists.

flas9sd | karma 299 | avg karma 1.33 2017-08-14 23:07:23+00:00 | [–] similar comments

HiQ signals your employer elevated likelihood of you jumping ship. Depending on situation, this could favor a pay discussion or put you at a disadvantage.

feelin_googley | karma 1160 | avg karma 2.55 2017-08-14 23:19:03+00:00 | [–] similar comments

Here's a copy of the pleading: http://www.almcms.com/contrib/content/uploads/sites/292/2017...

"In a press statement, LinkedIn says: "Our members control the information that they make available to others on LinkedIn and they trust us to honor that control. HiQ is taking member data, without their knowledge, and using it for purposes our members haven't agreed to.""

I use a text-only browser. As old-timers know the first web browser back in the early 1990's was also text-only. Text-only atill works great, believe it or not. Especially for reading information like one finds on LinkedIn.

I cannot acess LinkedIn after the Microsoft acquisition.

Microsoft says members control access. Do they check a box that says

   [ ] Disallow blind users from viewing my profile as text-only.

or

   [ ] Disallow access by HiQ.

The Microsoft statement says HiQ is "taking member data" without their knowledge. This sounds as though members are unaware "they" had a decision to make whether HiQ can access their profile, or whether a noncommercial user can access their profile with, e.g, links, w3m, lynx or the W3 Consortium's line mode browser. Indeed I doubt members either know or care about such access.

And if Microsoft partners with a company such as HiQ, then do members get a veto on use of their profile?

I think members probably have no right to even be informed that such partnerships exist!

Did LinkedIn members have any say in whether the company could sell itself, and control over access to their profiles, to Microsoft?

Web users today generally have little ability to control how companies share their data (e.g. with advertisers and partner companies).

We saw this type of argument in the 3Taps case. Where we were asked to believe Craigslist users were the ones enforcing their copyright or that they designated Craigslist to act as their agent. Anyone who uses the web knows this is utter BS.

"U.S. District Judge Edward M. Chen will preside over Thursday's hearing. At an earlier proceeding, on June 29, he appeared torn by the issues presented. On the one hand, he expressed skepticism that the federal Computer Fraud and Abuse Act-a criminal statute-really barred the mere use of bots to harvest public information. "You can get it manually if you hired a hundred million people to do it," he observed, "but if you want to do it quickly and automatedly, you can't do it? That is a crime?"

duskwuff | karma 18781 | avg karma 3.23 2017-08-14 23:32:28+00:00 | [–] similar comments

The vast majority of blind users do not use text-based browsers. This is a common misconception! They use normal web browsers controlled by accessibility tools like JAWS or VoiceOver.

feelin_googley | karma 1160 | avg karma 2.55 2017-08-15 05:42:36+00:00 | [–] similar comments

Here's a page with what appears to be all the case filings:

https://www.hiqlabs.com/legal/

Today's Order is bad news for CFAA fans:

"In particular the Court is doubtful that the Computer Fraud and Abuse Act may be invoked by LinkedIn to punish HiQ for accessing publicly available data..."

This is the same judge who tried one of the early CFAA cases that LinkedIn cites in support of its position. He is no stranger to the statute. (Perhaps he disagreed with Breyer's ruling in 3Taps.)

In the past few years LinkedIn has updated their User Agreement and Privacy Policy and expanded permission for third parties to access member profiles. Access by third parties is not limited to only selected search engines.

They allegedly allow members to opt-out of these data sharing partnerships. Otherwise the sharing is on by default.

Whether they actually disclose the identities of these partners I am not sure.

The Court seems interested in what members actually want, instead of only what LinkedIn wants for its members.

It wants to know about how LinkedIn members can control access to their own information through settings versus how LinkedIn can control it, allegedly on it members' behalf.

The transcript of the hearing for the TRO, specifically the Court's comments and questions, gives some insight on Chen's thinking about this case. After today, I think he is on the side of users. A dismantling of the CFAA as a tool to intimidate potential competitors (including users) has been a long time coming.

LinkedIn is asked why they let the HiQ scraping continue for so long before sending a cease and desist. And they are asked how they know that scraping is harming user trust. Have any users actually complained?

They are also asked what happens if a member would want to "opt-in" to the HiQ scraping.

LinkedIn counsel starts rambling about the CFAA and the court cuts him off to go back this simple question.

"Why not give consumers an option?"

LinkedIn starts rambling about CFAA again, drawing comparisons to Nosal.

Court cuts him off. "... it seems completely different. I mean, I tried the Nosal case. That's getting into the interior mainframe of a company to steal trade secrets, not collecting data that is otherwise publicly available."

Court: "... if you think it's the same, you can think it's the same. It's not the same in my book."

Today's Order confirms this thinking. CFAA is out.

As for whether HiQ and Prof. Tribe can make raise a consitutional issue (which would be great for users IMHO):

"... once you say the CFAA arms private parties and sanctions private parties to block access to information that otherwise is now public and available to the public -- at least it now raise the specter, a higher specter of constitutional analysis than if it were purely private action."

It is still a longshot but the Court seems to recognise the constitutional question is possible if HiQ strengthens its arguments. Today's Order confirms this. Court stated it is not satisfied with HiQ's consitutional arguments "at this juncture." There is still time to refine these arguments.

Court to LinkedIn: "... I'm not moved by your argument that, well, you use a bot to receive information, that's totally outside the ambit of the First Amendment, assuming there's any First Amendment to apply here, which is the bigger threshold question, it seems to me."

Court to HiQ: "I don't know -- you're not making any technical U.S. Constitution First Amendment argument."

LinkedIn kept trying to argue Hicks as supporting their right to ban HiQ from access in spite of any possible First Amendment protections.

Court: "Frankly, I don't find Hicks exactly very helpful and informative to what we've got to deal with here."

The other interesting comments from the Court in the TRO hearing were that LinkedIn does not have a copyright violation to assert.

After today's Order, LinkedIn needs another theory given that CFAA is out. Based on the comments in the TRO hearing copyright infringement is probably not going to work either.

phkahler | karma 20899 | avg karma 2.69 2017-08-14 23:21:23 | [–] similar comments

I would prefer if my LinkedIn data was not public in the sense that one should have to be logged in to see it, and ideally I should be able to see who viewed it. If you have to be logged in to look, they can obviously limit the number of profiles you can view to prevent scraping.

phkahler | karma 20899 | avg karma 2.69 2017-08-15 13:40:20 | [–] similar comments

Awesome! The day after this post I logged in to LinkedIn and they promoted a (new?) feature where you can restrict who sees your profile. I turned it off as much as possible for people who are not logged in. IMHO they should probably make that the default if they don't want people scraping data.

J_cst | karma 404 | avg karma 2.2 2017-08-14 23:30:32 | [–] similar comments

[English is not my primary language, so I'm probably wrong]

Shouldn't the title of this thread include a verb, like scraping/collecting?

brians | karma 2310 | avg karma 3.94 2017-08-15 00:04:02+00:00 | [–] similar comments

Well, good, right?

This is the same outcome most of us wanted between Swartz and JSTOR, and perhaps with Malamud and PACER. No technical control can be in the right place, but we can hope for a common understanding (maybe eventually law?) that terms of service may demand or prohibit some things but not anything.

makecheck | karma 12740 | avg karma 4.39 2017-08-15 00:11:24 | [–] similar comments

When these decisions are made, I hope they come with technical guarantees on ease of accessibility to data.

For instance, "buried in 6 layers of obfuscated XML" and "accessible in O(N^3) time" would both be implementations that are not "blocking" the data but they would still be extremely difficult to use.

sova | karma 2369 | avg karma 1.19 2017-08-15 00:26:13+00:00 | [–] similar comments

In a landmark decision that maintains you can take a picture of your community bulletin board... le sigh

angryasian | karma 2733 | avg karma 2.18 2017-08-15 00:26:23+00:00 | [–] similar comments

out of curiosity, does anyone know if this company is scraping just whats available if you're not logged in or are logging in and scraping. I believe Linkedin shows different information based on if you're logged in or not ?

comex | karma 19146 | avg karma 4.18 2017-08-15 00:31:02+00:00 | [–] similar comments

I don’t see anyone linking to the actual ruling, so I grabbed it from PACER. Here it is:

https://drop.qoid.us/linkedin-081417.pdf

thinbeige | karma 792 | avg karma 3.52 2017-08-15 00:34:13+00:00 | [–] similar comments

Does the startup want to scrape the public profiles which you see when logged out of LinkedIn? If yes, this profile data is of little value because it's probably 5% of the real profile data (mainly just the summary) and often there are no public available at all, many times these profiles are turned off for public virw.

Or do they mean the 'public' profile which you see when logged in? If yes, this would be a real case because this is awesome data I would like to scrape and which you could build interesting business cases with.

4684499 | karma 686 | avg karma 3.81 2017-08-14 20:26:35 | [–] similar comments

Can someone explain to me why didn't the robots.txt apply to HiQ's crawler? Is crawling still legal while the robots.txt disallows your crawler?

drngdds | karma 1481 | avg karma 3.64 2017-08-15 01:29:03+00:00 | [–] similar comments

Yeah, it is. Robots.txt is just a way of politely asking people not to crawl your site. It doesn't have anything to do with the law.

4684499 | karma 686 | avg karma 3.81 2017-08-14 20:45:12 | [–] similar comments

Thank you. I'm not sure if this analogy is correct but isn't that like a host of a museum telling people do not take pictures yet someone did it anyway and sold the pictures he took?

dawidloubser | karma 157 | avg karma 3.57 2017-08-15 11:29:19+00:00 | [–] similar comments

I don't quite think so. There are specific laws governing the use of of, say, photographic equipment within a private property.

If a web server, on other other hand, willingly serves content to both a browser being operated by a human, as well as screen-scraping software, then it shouldn't try to prescribe how the screen-scraper uses that information.

It would be the equivalent to, every time, asking somebody that works in the museum if you can take a picture, and them saying "yes", and then wanting to complain (or sue) afterwards.

icebraining | karma 48925 | avg karma 2.14 2017-08-15 12:01:21+00:00 | [–] similar comments

If a web server, on other other hand, willingly serves content to both a browser being operated by a human, as well as screen-scraping software, then it shouldn't try to prescribe how the screen-scraper uses that information.

That's not what was happening here. LinkedIn's server was blocking HiQ, and HiQ sued LinkedIn to prevent them from doing that.

drngdds | karma 1481 | avg karma 3.64 2017-08-15 01:27:17 | [–] similar comments

I'm confused. Is the judge saying that LinkedIn can't use the law to ban HiQ from scraping their profiles or that they can't implement technology to block scrapers? The former seems reasonable but the latter seems like an unjustified restriction on how they operate their site.

ViViDboarder | karma 1893 | avg karma 2.16 2017-08-15 02:01:10+00:00 | [–] similar comments

Looks like technology... Which seems strange to me.

chalst | karma 4793 | avg karma 2.85 2017-08-15 07:31:52+00:00 | [–] similar comments

Less strange given that LI is a monopoly and the judge was arguing that LI was unfairly restricting competition in HiQ's business space.

binarysaurus | karma None | avg karma None 2017-08-15 01:44:47+00:00 | [–] similar comments

How does the tinder debacle fit into this?

MechEStudent | karma 36 | avg karma 0.67 2017-08-15 01:45:03 | [–] similar comments

Linked IN could nuke this problem with a decent EULA.

razwall | karma 213 | avg karma 5.33 2017-08-15 01:46:50 | [–] similar comments

From reading the ruling, the injunction was based on a finding that HiQ raised serious questions about whether LinkedIn blocking HiQ's scrapers constituted a violation of California's unfair competition law by violating the spirit of federal antitrust law.

HiQ argued that LinkedIn has a monopoly on "the professional networking market" and is unfairly exploiting that monopoly to gain an advantage in the data analytics market. HiQ showed that LinkedIn might be developing an analytics product that competes directly with their Skill Mapper product.

c3534l | karma 6371 | avg karma 4.35 2017-08-15 02:30:13 | [–] similar comments

Great. So in a business that makes money primarily by collecting and selling user data, they're now required to give it away for free.

chalst | karma 4793 | avg karma 2.85 2017-08-15 07:28:24+00:00 | [–] similar comments

Correction: user-generated data. LI get the content from their users for free.

vntok | karma 424 | avg karma 0.59 2017-08-15 09:07:55+00:00 | [–] similar comments

I dont understand how you came to think that. What about the cost of hosting and serving the html forms (cpu and bandwidth), processing the content update requests, storing the resulting data structures? What about the marketing costs of making the brand known both by individuals and companies? The wages associated to any of the operations i have just mentioned?

What definition of "free" are you using here?

fastier | karma 4 | avg karma 0.4 2017-08-15 10:14:06 | [–] similar comments

I agree there seems to be a problems with means and investments: if I invest to create a website with valuable data, and anyone can come and scrape my data and start their own or a competing business, riding on top of my investment, isn't that a disincentive to me?

inthewoods | karma 1420 | avg karma 2.07 2017-08-15 02:04:56+00:00 | [–] similar comments

I'm curious how a ruling in this might potentially impact Google (or not). Google is scraping those same profiles, but Linkedin clearly has no issue with that because it drives traffic to their site. But Google is also making money off of those profiles.

How can Linkedin argue that Google be allowed to scrap but other third party cannot?

mcbits | karma 2217 | avg karma 2.51 2017-08-15 02:14:42+00:00 | [–] similar comments

Microsoft/LinkedIn has so much more juicy and actually private data that nobody can scrape, I don't understand why they would even make a scene and tarnish their image over scraping. They know who declines connection requests from whom, when a CEO starts looking for a new job, and so on.

mpcovcd | karma 48 | avg karma 2.53 2017-08-15 02:18:00+00:00 | [–] similar comments

Interesting. Does anyone know how this compares to the legal cases around startups that would scrape Craigslist public data?

c3534l | karma 6371 | avg karma 4.35 2017-08-15 02:28:03+00:00 | [–] similar comments

So does this ruling essentially outlaw robots.txt? If I only give access to certain users, that's... illegal now? We're calling that a monopoly? How is this reasonable to anyone else?

ahmeni | karma 234 | avg karma 5.57 2017-08-15 02:56:26 | [–] similar comments

The most surprising aspect of this is having anyone manage to consider any of the data valuable at the tire-fire of a social network LinkedIn.

flyGuyOnTheSly | karma 1479 | avg karma 2.57 2017-08-15 03:50:33 | [–] similar comments

How does this affect Craigslist's Cease and Desist request to padmapper? [0]

[0] http://blog.padmapper.com/2012/06/22/bye-bye-craigslist/

wyldfire | karma 20828 | avg karma 4.03 2017-08-15 03:54:03+00:00 | [–] similar comments

This medium post [1] "The Birth And Death Of Privacy: 3,000 Years of History Told Through 46 Images" gives an interesting and unintuitive context about [personal] privacy, which is relevant in this debate about how to balance personal privacy w/society's value in openness [and honesty].

[1] https://medium.com/the-ferenstein-wire/the-birth-and-death-o...

mtokunaga | karma 3 | avg karma 0.43 2017-08-15 04:36:13+00:00 | [–] similar comments

This type of decision might also impact Yelp or any others in similar businesses. Currently their API limits a top few reviews per business via their API, and also prohibits "scraping" of data in other means.

I was going to do some experiments with larger datasets from businesses in a region, but quickly found that's not possible.

devmunchies | karma 3556 | avg karma 2.27 2017-08-15 04:48:57+00:00 | [–] similar comments

And Google search results as well. Search results are publicly accessible but if you try to crawl them, Google will block it.

If it becomes illegal to block crawlers, then Google is gonna get hammered with bot traffic.

It will also mean that google won't have to be the front-end to search results, and anyone can build on top of it, which could kill google ad revenue because then you could create anonymous google searches.

tortasaur | karma 342 | avg karma 2.69 2017-08-15 17:43:36 | [–] similar comments

Google already isn't the only frontend to their search. Look at Startpage, for example.

jakubbalada | karma 11 | avg karma 0.69 2017-08-15 09:45:53 | [–] similar comments

You might try Apifier for that, we've recently scraped more than 150k reviews for 27k restaurants in London.

Here's a community crawler you can use: https://www.apifier.com/community/crawlers/Yonny/bcYqH-api-u...

ViViDboarder | karma 1893 | avg karma 2.16 2017-08-15 14:28:45 | [–] similar comments

Yelp has several public data sets available for research purposes. Might not be the region you were looking at, but for academic purposes, might be useful.

paulie_a | karma 1667 | avg karma 0.68 2017-08-15 05:21:32 | [–] similar comments

I do find it odd that LinkedIn is fighting this considering they outright steal your contact list and will spam your friends and family for years.

Recently I was setting up my new phone and thought about installing their app and I thought to myself, why?

Eventually that thought came back to me when I was attempting to update my profile and simply decided to delete it entirely.

exabrial | karma 16024 | avg karma 2.67 2017-08-15 05:45:09 | [–] similar comments

Sorry a private company can do whatever it wants with its own property. They're paying for the power and bandwidth...

This needs to be overturned unless LinkedIn is violating FRAND priciples.

Throaway786 | karma 4 | avg karma 0.8 2017-08-15 06:22:27 | [–] similar comments

Will this mean Google search can now also be scraped.

dragonwriter | karma 118260 | avg karma 2.17 2017-08-15 06:28:02+00:00 | [–] similar comments

This is a preliminary injunction, which is a tool to prevent irreparable harm to a party before a case is resolved on its merits.

It shouldn't be taken as a strong indicator of what will be found legal in this case, much less a different hypothetical one.

pavlakoos | karma 16 | avg karma 0.16 2017-08-15 07:24:16+00:00 | [–] similar comments

Shouldn't LinkedIn simply introduce captcha?

icebraining | karma 48925 | avg karma 2.14 2017-08-15 11:58:49 | [–] similar comments

For accessing the public page of profiles?

pavlakoos | karma 16 | avg karma 0.16 2017-08-15 13:14:42+00:00 | [–] similar comments

Well, not just for accessing, but for scraping, meaning accessing a huge number of those pages in an automated way.

icebraining | karma 48925 | avg karma 2.14 2017-08-15 13:15:58+00:00 | [–] similar comments

How would that work for Google?

If you think they shouldn't show it for Google, then that's what HiQ is suing them over :)

fastier | karma 4 | avg karma 0.4 2017-08-15 07:53:46+00:00 | [–] similar comments

I have a simple question: what is "public information" with respect to websites? Anything I put on a privately owned website automatically becomes "public"?

pjc50 | karma 93685 | avg karma 3.72 2017-08-15 09:04:57+00:00 | [–] similar comments

The question of "data protection" hasn't been discussed enough here - it may well be that Linkedin has no case against HiQ, but if HiQ is scraping people's PII in the EU they are required to have the permission of the data subjects.

How this interacts with Safe Harbour I have no idea.

brango | karma 297 | avg karma 1.81 2017-08-15 09:13:41+00:00 | [–] similar comments

IANAL but I suspect the forthcoming GDPR will make this illegal for EU customer data. It gives users greater control over their data, so I wouldn't be surprised to find that it corresponds with this judgement, i.e. if a user determines data to be publicly available, it must be made to be so.

Anyone know whether this is right?

And BTW in case you're not aware, if you hold data from any EU citizens you'll be required to comply with the GDPR regardless of where you're located.

codedokode | karma 6872 | avg karma 1.97 2017-08-15 09:38:08 | [–] similar comments

> U.S. District Judge Edward Chen in San Francisco granted a preliminary injunction request brought by hiQ Labs, and ordered LinkedIn to remove within 24 hours any technology preventing hiQ from accessing public profiles.

That is actually dangerous. Why some startup or some judge can tell me to whom I can serve content and to whom I cannot?

coldcode | karma 11915 | avg karma 3.19 2017-08-15 12:14:53 | [–] similar comments

The question not really answered is, what is public profile data? Is it visible to the general public, other linked in users, partially hidden?

heisenbit | karma 2880 | avg karma 2.59 2017-08-15 13:01:49+00:00 | [–] similar comments

Here in Germany there is copyright protection for databases as a compilation separate from individual facts in the database.

dizzydes | karma 284 | avg karma 2.68 2017-08-15 16:10:20+00:00 | [–] similar comments

They needed to make it available within 24 hours - does that mean public profiles are now scrapeable like any other page?

I tried a year ago and obviously it was impossible.

tejas1mehta | karma 38 | avg karma 0.86 2017-08-16 04:11:51+00:00 | [–] similar comments

I wonder what would happen if some startup started using public Yelp reviews.

joyneop | karma 1 | avg karma 0.5 2017-08-22 11:32:54 | [–] similar comments

Nobody can require one to forget a knowledge, nor shall anyone be entitled to do so. Collection and use of data should be legitimate. Storing data in HDD and remembering things in brain are identical. Infringement of privacy, reputation, or copyright is another issue.

Legal | privacy