U.S. Supreme Court revives LinkedIn bid to shield personal data

austincheney · on June 15, 2021

It is because I am a strong advocate for privacy that I believe Microsoft is wrong. If Microsoft really wanted to protect user privacy they would restrict access to user data on their platform. This is basic entry level security: apply access control.

Microsoft wants to split hairs in order to both maximize revenue and prevent competitor access, which is nothing to do with privacy. If Microsoft wins companies will be able to use this verdict to destroy privacy for revenue gain.

jakelazaroff · on June 15, 2021

Is hiQ a competitor to LinkedIn? They scrape people’s profiles so they can snitch to employers about employees whom they think are job hunting. AFAIK, that’s not a service LinkedIn provides.

Moreover, I would like it both ways. Access control would be great, but it sidesteps the underlying issue. I’m fine with my profile being public — in fact, I want it to be — but I don’t want my data to be vacuumed up and weaponized against me.

I agree that anti-hacking laws are not appropriate here. But what hiQ is doing is sleazy as heck, and we should absolutely try to throw out as much bathwater as possible without throwing out the baby too.

danuker · on June 15, 2021

> I’m fine with my profile being public — in fact, I want it to be — but I don’t want my data to be vacuumed up and weaponized against me.

LinkedIn is a authwalled garden only available to those with accounts. "We may or may not have a Jake Lazaroff. Who's asking?".

Sounds like you need a personal website.

jjk166 · on June 15, 2021

When most people say they want things to be public, they typically mean "accessible to those who have a legitimate reason to access it, which may include some people I have not explicitly authorized."

Also LinkedIn does have public profile settings which allow some or all of your profile to be visible without an account.

https://www.linkedin.com/help/linkedin/answer/83/linkedin-pu...

mrwww · on June 15, 2021

I believe LinkedIn started providing some services similar to HiQ, which is one of their arguments.

How can your profile be public but not vaccumed up? Google vacuums it to index of course. Or perhaps you mean only public on Linkedin?

What the Russians or any other bad actor you care to think about already has done with your public profile is probably way way sleazier.

bingidingi · on June 15, 2021

Yes, important to note that Microsoft is going after a competitor here; there are a bunch of data brokers that have been scraping LinkedIn for years and this is the case they're going after. I hope the Supreme Court can see through this BS and tell them to go pound sand.

(If Microsoft wanted to go after data brokers I'd be on board for that tho)

XenophileJKO · on June 15, 2021

LinkedIn pretty much tries or at least tried to stop ANYONE from scraping other than search engines, etc.

nojito · on June 15, 2021

It was hiq that sued linkedin

mrastro · on June 15, 2021

After being threaten by LinkedIn: "LinkedIn told hiQ in 2017 to stop scraping LinkedIn's public profiles or face liability under the anti-hacking law."

I think it's fair to say LinkedIn is the instigator in this case.

Causality1 · on June 15, 2021

That seems like it would have been a breathtakingly easy problem to solve using technical means instead of legal threats. They could've required logins to view profile information. They could have blacklisted any IP address exhibiting automated behavior.

onion2k · on June 15, 2021

That sort of 'solution' is just a game of technical whack-a-mole as one party finds a new way to scrape the data and the other finds a new way to stop them though. If you want to permanently stop scrapping you need a non-technical solution that stops someone even trying.

mixologic · on June 15, 2021

But thats exactly the issue here. If they find a way to limit the data such that they can split who is "authorized" and who is not, then they have found something that falls under the now new reading of cfaa.

It's not going to stop scrapers from entities outside of the jurisdiction of the US legal system, but it's going to end up icing companies in the US who do this.

But I think Microsoft/Linked In is going to have a very difficult time redefining authorized when the data is publicly available to anonymous access.

bingidingi · on June 15, 2021

...after LinkedIn sent them a cease and desist and started causing problems with hiQs third-party contracts. They went after hiQs business with a dubious application of existing laws (CFAA).

jjk166 · on June 15, 2021

It's not a dubious application. The CFAA says that it's illegal to access a computer system you don't have authorization to access, regardless of whether it's technically possible. It's no different than if a manager banned someone from a restaurant - if they return, even though the restaurant is easily accessible, they are nevertheless trespassing.

The recent ruling on the CFAA introduces some confusion in that if a user has authorization to access a system, they are no longer in violation of the CFAA even if they access it for disallowed reasons (ie if you're allowed in the krusty krab, you're not trespassing even if you only went there to steal the krabby patty secret formula). But that's irrelevant here because hiQ was explicitly told they did not have authorization to access (ie Plankton isn't allowed in the krusty krab regardless of what he intends to do there).

carbocation · on June 15, 2021

I agree. I want to add that the headline is infuriatingly credulous of Microsoft's claims.

TeeMassive · on June 15, 2021

And let's not forget that Microsoft probably is using that data when it comes to their own hiring decisions and recruiting efforts.

username90 · on June 15, 2021

They already sell the data to anyone who wants to pay, they have no moral high ground here at all, the only extra "protection" you'd get is that companies would have to pay Microsoft more money to abuse your data.

rurabe · on June 15, 2021

This is a pretty bad headline. I don't know that i would characterize this as revived.

The same 9th circuit who held last year that LinkedIn could not block hiQ from scraping public data, just got asked to reconsider the same case, except now there is additional precedent that SCOTUS says if you had permission to access the computer then it's not a violation of the CFAA (even if you are a shady corrupt cop).

Hard to see this turning out any other way than the 9th circuit reaffirming their decision (or even strengthening it) and then it's up to LinkedIn to try SCOTUS again

curryst · on June 15, 2021

The 9th Circuit never reached a resolution; that case was over a preliminary injunction, and HiQ was only required to demonstrate that they had raised "serious doubts" about LinkedIn's behavior. The court decided to stop prior to the actual case until Van Buren was resolved.

Also, the injunction preventing LinkedIn from blocking HiQ has nothing to do with the CFAA. LinkedIn can't block HiQ because HiQ is alleging that doing so constitutes tortious interference under California law. Again, whether it is or not hasn't been decided, it was only ruled that HiQ raised "serious doubts" as to whether that's the case. Were HiQ and LinkedIn not competitors, LinkedIn would be free to continue blocking HiQ.

The CFAA bit has to do with whether LinkedIn can sue HiQ under the CFAA; it's just an alternative to try to kill their business in the event they lose the tortious interference part. It's a federal law, so it may supersede the state level tortious interference laws. The issue at hand in that case is whether a user can be considered "unauthorized" without providing an affirmative form of authentication. I.e. does IP blocking someone and sending a cease-and-desist make them unauthorized, and does ignoring that cease-and-desist and circumventing IP blocks constitute "unauthorized access"? Or, more generally, does the CFAA protect systems that aim to keep specific people out, or only ones designed to only allow specific people in?

So at this point, it's "revived" in the sense that SCOTUS made a ruling, and the actual case can move forward to resolution. I expect it to end up in the Supreme Court.

I like the outcome of the 9th Circuit's decision, but their reasoning is horrid. The difference between a system that only allows 3 people in and a system that stops everyone except those 3 from logging is purely semantic. The former is far, far more common, but the difference is largely one of practicality. It's drastically easier to build a system that only allows 3 people in than one that keeps everyone else out. However, in their ruling it's perfectly legal to circumvent the banlist solution. It's only illegal to circumvent allowlist solutions.

It also seems incoherent with regards to DDoS attacks. Their stance is that sites that don't require authorization are open to the public (they are "entitled to access by a computer"), regardless of the method in which the public chooses to consume the information as long as it is via computer. A DDoS is a form of access, and their opinion is that companies cannot set terms around how you access their computers; therefore it would follow that since they can't "unauthorize" me, I am implicitly authorized to DDoS them. And if I'm not, where's the line between DDoS and not? Accessing public data can't be a crime; is accessing it in whatever the most expensive way for them to serve it to me a problem? I can make a scraper that pulls competitors prices from their site using their search bar and do it in the most inefficient way possible by iterating through all the character combinations to overwhelm their search infrastructure. Is their only recourse really to put that behind a login?

I don't see any way to read the CFAA under their opinion that makes any kind of sense. I agree, public data should be public, but it really should be addressed in another piece of legislature. This is just going to be an awful can of worms to open.

Goety · on June 15, 2021

I disagree with that SCOTUS decision. It completely obliterated CFAA. Imagine if they said nurses/doctors could do that with their terminals and it didn't violate HIPAA.

I will say there is a ridiculous amount of redtape around law enforcement using data. Loopholes with third party access is already something that exist. So if it's above board monitoring would be easier... But I'm not sure we have adequate monitoring let alone enforcement now.

I feel this is a weak case to attack 3rd party data scrapers/brokers. The public generally recognizes the monster we created by having life changing data accessible to anyone with $50 and a bank account.

I want to side with LinkedIn but realistically I'm becoming more and more jaded on the concept of open internet and iot of everything. I hate the alternative of an open internet worse. I would love to restrict data scrapers but at the same time should we restrict who has that data? I'd rather we shift how we use the internet and socially enforce boundaries on companies.

I cant even open my fridge, use my microwave, stove without it being logged either by the electric company or bluetooth enabled appliance with TV and wifi temperature control software company where you hope an update doesnt brick the appliance.

There is no way in my mind that data helps the consumer. It might help companies maximize profit but at what energy consumption/cost to the environment?

cortesoft · on June 15, 2021

> I disagree with that SCOTUS decision. It completely obliterated CFAA. Imagine if they said nurses/doctors could do that with their terminals and it didn't violate HIPAA.

The court was absolutely correct in their ruling. If you don't want cops using that data for their own purposes, it should be against the law.... it doesn't make sense to use the CFAA as a catch all for stopping people from misusing data they were given access to. If we do, it gives every private company the ability to make breaking their EULA a criminal offense. That is ridiculous.

HIPPA is a good example of how the law should work. You make what you want illegal; it has nothing to do with computers.

Why would the cop using a computer to access the information be against the law but not a cop going and reading a paper file?

dragonwriter · on June 15, 2021

> HIP[A]A is a good example of how the law should work. You make what you want illegal; it has nothing to do with computers.

HIPAA has lots of rules that apply only with computers (or, specifically, a very interesting definition of “electronic transaction”), which is a big reason fax is still a thing in healthcare, because transactions conducted by fax are not considered “electronic” under HIPAA, so a variety of rules that apply when transactions are conducted electronically do not apply.

Goety · on June 16, 2021

> use the CFAA as a catch all for stopping people from misusing data they were given access to. If we do, it gives every private company the ability to make breaking their EULA a criminal offense. That is ridiculous.

That is a stretch. This case was specifically applied to the public sector and 'not completely unauthorized' makes CFAA almost inapplicable to public sector databases.

nulbyte · on June 15, 2021

> I disagree with that SCOTUS decision. It completely obliterated CFAA. Imagine if they said nurses/doctors could do that with their terminals and it didn't violate HIPAA.

I don't see the similarity between CFAA and HIPAA, here, and SCOTUS didn't obliterate the CFAA. Theybsimply said, if you are authorized to use a system, your use of the system isn't unauthorized. That's fairly straightforward.

HIPAA, on the other hand, regulates disclosure of specific data. You can violate HIPAA even if you are authorized to use a system that holds covered data.

anfilt · on June 15, 2021

And HIPPA covers more than just computers. It includes paper records. Using just the CFAA as crutch for data the should not misused still allows misuse of paper documents or overhearing a conversation ect...

Goety · on June 16, 2021

They made it unenforceable in the public sector. Some people make parallels that both law enforcement and healthcare are somewhat a public good.

cletus · on June 15, 2021

The Supreme Court curtailed the broad scope of the CFAA recently in the van Buren case. I personally agree with that decision (and also find it funny as it continues the long trend of Clarence Thomas being on the wrong side of history). While van Buren's actions were obviously problematic he was an authorized user of the computer system.

The issue is the vagueness of the CFAA has been a prosecutor's wet dream and a predictable source of overreach. Case in point: Aaron Swartz.

So here's what's interesting: by taking up this case SCOTUS is potentially going in the other direction. The Appeals Court held that the CFAA didn't apply, allowing hiQ to continue. And here's where (in my layman's view) SCOTUS may choose to act: The Appeals Court stopped LinkedIn taking action to impede hiQ's access.

I personally view this as overreach. There's a difference between not "hacking (in CFAA) terms a website and blocking the information provider impeding bots.

As much as Microsoft/LinkedIn is a nightmare of data misuse and dark patterns (eg to obtain contacts), I think it would be a bad decision to let the current hiQ ruling stand. Disallowing sites from taking action to block scrapers from systematically taking all that information and building a competitor seems like a bad idea.

But given the van Buren ruling, it seems unlikely SCOTUS will reverse course and declare hiQ's actions "hacking" (in CFAA terms). I suspect they'll curtail the remedy.

dannyw · on June 15, 2021

My counterpoint: companies shouldn't get to have their cake and eat it too, as in offering public content, but then preventing competitors from using that public content.

To use another example, what if Google+ was a success, and Google blocks every other competing search engine from indexing even public posts on Google+? Is that a good outcome?

There are already copyright protections to stop Shutterstock from ripping, say, all of Getty Image's photos.

What we are talking about here is information so banal, so unoriginal, so uncreative, that copyright doesn't even apply.

jjk166 · on June 15, 2021

Let's say you own a restaurant. It's a public place, people are welcome to come on in and take a seat. But there's a guy across the street who likes to come in, take pictures of everyone eating there, piss off your customers, and then leave without even ordering anything. After he's done this several times, you tell him he's not allowed in your restaurant anymore. He comes back anyways. Is he trespassing?

This isn't a matter of stealing information. HiQ is more than welcome to acquire the information by other means. If hiQ scraped the same information from an archive site hosted by someone else it would be fine. But they've been explicitly told they're not authorized to access linkedin, and the CFAA is unambiguous that if you don't have authorization to access a system it's illegal to access it anyways, which is very much in keeping with how we'd treat access to a brick and mortar public business.

gwoplock · on June 15, 2021

> My counterpoint: companies shouldn't get to have their cake and eat it too, as in offering public content, but then preventing competitors from using that public content.

I agree this case is taking an anti-competitive action and spinning it into a pro-privacy case. However, what's the line? Personally I believe if you apply the rules equally to competitors and non-competitors alike you can restrict programmatic access to public data. Since LinkedIn isn't doing that, HiQ shouldn't be restricted.

In any case I think this is a poor use of the CFAA.

jrochkind1 · on June 15, 2021

> by taking up this case SCOTUS is potentially going in the other direction.

They didn't actually take it up, but sent it back to the 9th for a re-hearing in light of Van Buren v. US. But that ruling seems to make LinkedIn's case even weaker, I too am somewhat confused about what's going on, what options were available to the supreme court and what signal this one sends.

Anyone have a better article?

cletus · on June 15, 2021

Thanks for highlighting this.

Honestly I missed this subtlety but it does make it even clearer: SCOTUS is asking the Appeals Court to revisit if bots constitute "unauthorized" use (under CFAA).

Van Buren decided that if an authorized user used a computer system for what were effectively policy violations and didn't bypass restrictions on that use then it's not "unauthorized" in the CFAA sense.

Consider a phone book (ie the white pages). It has names of individuals who haven't opted out in alphabetic order. This makes it incredibly easy to find a mapping of name -> phone number (in O(log n) technically). You could do a lookup of phone number -> name but its inefficient (O(n)) and the volume of names is so large that it's not feasible for a person to do it.

There are good reasons to prevent this reverse mapping, privacy among them.

Now imagine that phone book is online. Obviously someone could scrape this data and built that reverse index. So the website provider does things like rate-limit your queries, occasionally CAPTCHA you (FWIW) and so on.

If the LinkedIn decision stands, such impediments may be blocked by courts. That's probably not a good outcome.

Additionally, doing that lookup yourself is infeasible but totally feasible with a bunch of automated lookalikes (ie bots).

So I would say SCOTUS is hinting that the Appeals Court ruling is too broad and some limits on website access can apply.

That's my guess anyway.

jrochkind1 · on June 15, 2021

> If the LinkedIn decision stands, such impediments may be blocked by courts. That's probably not a good outcome.

Nope, I think you have it totally incorrect. The issue isn't whether it can be illegal to put technology impediments in place -- that isn't a claim in this case.

The issue is potentially whether you can charge someone with a felony for working around the impediments you put in place. (i'm not totally sure if that is relevant in this case?) Or even more so, if you can charge someone with a felony for hacking for just doing something you politely asked them not to do.

But there is nothing at stake here that could possibly make it illegal to put technological impediments like rate-limiting or CAPTCHA's in place. If I'm wrong, please direct me to the details of where/why. But that was definitely not at stake in Van Buren v. US, which is what the Supreme Court says prompted the remit for a re-hearing. The CFAA is about a felony crime of unauthorized access; nothing in it, decided either way, can make it a crime (or "blocked by courts") to put technological impediments in place.

But overall, OP article is doing a very poor job of explaining either the facts or law of this case, so there's a lot I'm not sure about from just this write-up and what else I've been able to lazily google alone.

If anyone has a better article explaining the relevant facts and law in dispute here, I'm still interested.

lizdax · on June 15, 2021

Curious to see which direction the Supreme court goes with this -- given the last CFAA ruling.

>LinkedIn told the Supreme Court that hiQ's software "bots" can harvest data on a massive scale, far beyond what any individual person could do when viewing public profiles.

So can anyone with a couple thousand dollars to burn. By barring US companies from doing it all you're going to do is move the work to another country that doesn't care about US Law. The data will still be scrapped nonetheless.

NaOH · on June 15, 2021

>LinkedIn told the Supreme Court that hiQ's software "bots" can harvest data on a massive scale, far beyond what any individual person could do when viewing public profiles.

That doesn't seem any different than when I registered a new business in my state. I did that online, and before I received the printed, mailed confirmation from the state I was getting credit card offers in the mail addressed to the business. I'm not glad how that played out, but I don't see any difference with hiQ collecting public information.

fncypants · on June 15, 2021

Secretary of State / corporate information is often sold in bulk without any need for scraping without authorization.

> Daily Filing Update – a file that includes all of the database updates for a specific day. Customers frequently begin by purchasing a Master Unload in order to create a database, and then subscribe to the Daily Filing Update so that they can download and update their databases on a daily basis to keep their data current with that maintained on the SOS BEST database.[0]

[0] https://direct.sos.state.tx.us/help/help-corp.asp?pg=bulk

cwkoss · on June 15, 2021

Typo quibble: the data will still be scraped - if it was scrapped it wouldn't be an issue!

rurabe · on June 15, 2021

Note that LinkedIn could make this whole issue moot by putting their data behind a login tomorrow. The 9th circuit decision pretty clearly stated that.

The only reason the data is public is for marketing purposes, to sign up more users.

ID1452319 · on June 15, 2021

I want some of my LinkedIn profile to be public. I DON'T want any random company scraping that PII data, storing it, processing it and making money from it.

Is that too much to ask?

xbar · on June 15, 2021

Of whom?

ID1452319 · on June 17, 2021

In this case it would be the US courts. In Britain it would be the ICO who despite claims from companies like hiQ to be GDPR "compliant" they clearly aren't.

bryan_w · on June 15, 2021

I thought they were prevented from doing that (at least at some point, by injunction)

mcny · on June 15, 2021

Each Linked In user has the option to make as much or as little of their information public. I don’t see how any injunction can stop that. None of this data belongs to LinkedIn. It nudges us to make more of our data public.

I don’t even want to get into what goes on in the mind of a Linked In “influencer”.

I am convinced salvation cannot come from the courts and at the same time Congress has ceded all authority to the White House. The goal for everyone in our industry must be complete abolition of the CFAA but that can’t happen until we can get Congress to act.

rurabe · on June 15, 2021

Yeah each user gets the choice right now as a consequence of LinkedIn's product design.

As far as I know, there's no reason it couldn't be "Log in to see this profile".

I believe the injunction bars LinkedIn from blocking hiQ specifically from accessing the publicly available information (which again is public by LinkedIn's choice). They made a point to draw a distinction between the public stuff and stuff behind a login/password.

kizer · on June 15, 2021

Scraping is inevitable. It seems like it exists in this grey area of a “hacky” accepted practice. I think it should be expected today. If it’s public, it’s public. Of course copyright applies, but day after tomorrow people will just run copy through a GPT-3-like scrambler or rephrasher or something and publish it anyways.

The reality is that anything public on the web will be able to be harvested and made use of —- sort of how it’s been and really how it should be.

TechBro8615 · on June 15, 2021

Any ideas how Microsoft wants to frame this so it becomes a partisan issue? Microsoft Legal can’t change the facts of the case, but presumably Microsoft PR wants to shape the narrative into one agreeable to the conservative majority on the court.

I have a feeling that regardless of the specifics, it will be a narrative where the little people lose and the big people get to keep playing. I would love to be proven wrong though.

If I’m right, it’s a stark reminder that Microsoft need only sustain its open-source friendly, developer-empowering persona as long as it’s convenient and rational. When there’s territory at stake, that veneer of friendliness begins to crack.

jb775 · on June 15, 2021

What a joke that the courts within our society are so obviously controlled by big business. LinkedIn is essentially asking for legal protection so they can box out competition as they pimp out user data they don't even really own in the first place.

The argument our society should be discussing is that individuals own the underlying data. All previous "user agreements" are null and void. All new user accounts should be bundled like non-fungible tokens, then every piece of data associated with that account can be tracked to compensate that user based on actual impressions. Companies can take their cut, but if a user isn't happy with the setup, they can revoke all access to their own data. Enough of this "you agreed to our service agreement" smoke and mirror bullshit...that's like saying the water company owns the water itself.

pverghese · on June 15, 2021

By that same token, all use of free services should be charged from the time of use of said services, since previous agreements are nill and void.

By using said services you have provided your information given the understanding that your information will be used by the companies that provide the services

6gvONxR4sf7o · on June 15, 2021

On the other hand, I want to use linkedin without my data being taken by other companies. It’s my data, not linkedin’s and certainly not hiq’s.

gip · on June 15, 2021

The headline is misleading. What is happening is that LinkedIn is using user data for its own benefit only and blocking innovation. As a LinkedIn user I want everyone to have access to the data I'm sharing publicly through the service.

I can't wait for entrepreneurs to take on LinkedIn. There are a lot of opportunities there.

mLuby · on June 15, 2021

I'd rather entrepreneurs build on top of LinkedIn.

There's so much untapped potential to build upon aggregated markets (of professionals, of jobs, of housing, of singles, of restaurants, etc) rather than building yet another aggregator. The existing aggregator cements its dominant position, can charge rent to connected apps, and attracts rather than repels killer apps.

The main problem I see is fears that connected apps will siphon away the aggregator's data, and/or combine that data with external data; both are attempts to supplant the aggregator. Maybe this fear can be assuaged by technical or legal means.

jb775 · on June 15, 2021

What do you do when you build an entire business on top of LinkedIn, then they decide to completely revoke your access?

We need DE-centralization, so no-one in the world has this type of power over anyone.

gip · on June 15, 2021

Totally agree. About 10 years ago I was the first hire of a startup built partially on top of LinkedIn. A Monday we got a cease and desist letter. We had to immeditely stop using their API and had to start crawling the data instead. That almost killed us.

altdataseller · on June 15, 2021

What was this startup?

ineedasername · on June 15, 2021

Doesn't Google pretty much already scrape all of LinkedIn and partially republish in search results (and cached pages)? Microsoft doesn't have a problem with that.

durnygbur · on June 15, 2021

You've got to be in the cartel.

cj · on June 15, 2021

2 questions I asked myself after reading this:

1) Do I side with LinkedIn or HiQ?

2) What case law precedent do I think we ought to set?

I’m personally conflicted on which way to lean. I see pros and cons to both - this is an interesting case.

cwkoss · on June 15, 2021

I feel strongly that non-commercial scraping should be protected.

I think commercial scraping should probably be allowed, but there should be some sort of mandatory maximum retention limit, so deleted (or updated access controls on) content eventually ages out (or gets transferred to a non-commercial custodian for archival purposes).

username90 · on June 15, 2021

I don't see why linkedin should be allowed to monetize it if other companies aren't. If they just monetized the website via ads it would be fine, but since they sell your aggregated data to people I don't see why other data brokers shouldn't be able to also sell your publicly available data. Linkedin as a data broker and linkedin as a website are two unrelated businesses, and extending dominance in one to the other is cause for anti competitive action from the government, HiQ already won in lower courts which is why this has gone this far.

cwkoss · on June 15, 2021

Well I'd like to see some regulation on commercial data brokers too.

I think individuals should have the right to know whether they are in datasets and the right to be deleted from that dataset upon request if they never gave direct consent to the entity holding their data. It seems extreme from where we are now, but maybe data brokers should be required to notify people when they're added to a dataset without direct consent, including instructions on how to be removed if they don't want to be included.

wslack · on June 15, 2021

Scraping for non-profit purposes feels fine. Scraping for commercial purposes feels dirty. I don't know if there's a way for the law to square that circle.

fncypants · on June 15, 2021

If what you are feeling is that scraping for commercial purposes feels "unfair," that is because if it is not allowed it traditionally falls within a legal cause of action commonly called "unfair competition." In the same vein as trade secret theft. So the law does square that circle, but the line drawn is a fuzzy gray one. Its unfair to free ride off a competitor, but yet some kinds of actions that look like free riding are in fact allowed.

postalrat · on June 15, 2021

How about a browser extension that scrapes only the the profile you are visiting and adds that information to an ATS. Just to prevent a lot of copy paste.

jrochkind1 · on June 15, 2021

When Google does it, it's commercial, right? Should you need a license from all indexed content to make a search engine?

Isn't indexing for search what HiQ is doing too?

gnicholas · on June 15, 2021

I had a strange experience with LinkedIn today: I had to restart my computer, and when my browser started it re-opened all my tabs. This included about 50 LinkedIn tabs, which I opened sometime over the last 3 months (most of them were opened yesterday).

LinkedIn logged me out and accused me of unauthorized use or some such thing, and then locked me out for a period. When I submitted a help request and explained what had happened (browser reopened tabs), they gave a boilerplate answer that was totally non responsive.

I understand they want to defeat scrapers (whether I think they should or not), but outcomes like this are a pretty annoying way to do it. Can they really not tell when a browser is restarting and opening a bunch of tabs that the user has already viewed (and which are all 1st or 2nd degree contacts)?

edit: added context around when tabs were opened, in response to a comment.

bpodgursky · on June 15, 2021

It seems possible, but it also seems like a sorta painful problem to solve if locking an account requires dredging through 3+ months of a user's interactions.

At a lot of companies that kind of historical data ends up getting batched and stored in some kind of timeseries datastore (aka S3) and then batch-operated on via Spark, MapReduce, Flink etc in hourly or daily jobs. So instead of implementing account locking and scraping detection at the realtime layer looking at current requests, you have to write some ugly daily aggregate & backfill job, which inevitably ends up super expensive and slow.

Anyway, yes, but I can definitely sympathize with _not_ doing it this way.

gnicholas · on June 15, 2021

Fair enough, although 30 of the tabs were opened yesterday, as I was researching people who will be attending an upcoming presentation I'm giving.

Judgmentality · on June 15, 2021

Just curious, how many tabs do you normally have open?

gnicholas · on June 15, 2021

Maybe 100? I use a Chromium-compatible version of the Tree-Style-Tabs extension, which makes it easy to organize and navigate dozens of tabs.

MattGaiser · on June 15, 2021

> LinkedIn told the Supreme Court that hiQ's software "bots" can harvest data on a massive scale, far beyond what any individual person could do when viewing public profiles.

So would it be legal if I funnelled the profiles into Mechanical Turk and had it manually entered?

failwhaleshark · on June 15, 2021

LinkedIn had a well-known profile viewing crack for years. Add this magic to the URL and poof, you don't need to login or be connected to them. It wasn't even close to a "hack" how trivial it was to bypass LinkedIn's (not) security.

durnygbur · on June 15, 2021

> hiQ Labs

So the recent LinkedIn "data leak" are public profiles scrapped by hiQ from LinkedIn and then "leaked" again from hiQ? or hiQ is selling the data dumps undercover?

Either way hard to pity MS and LI, the website is a Potemkin village...

rhacker · on June 15, 2021

It seems absolutely nuts to me that this case ended where it did:

https://en.wikipedia.org/wiki/Spokeo,_Inc._v._Robins

And we're having a discussion about linkedin might possibly lose. In the spokeo case it's data that Robins said was untrue about him and Spokeo was allowed to keep sharing it - simply because he couldn't prove harm.

Which, is laughable in TODAY's political climate where harm is waking up in the morning - because of all the data sharing happening everywhere and being unable to stop it.

fibers · on June 15, 2021

What would happen if they ban scraping altogether? Is this the end of google search?

pionar · on June 15, 2021

I doubt it. Google's crawling bots (like most search engine bots) follow a published standard for access rules for bots (robots.txt).

MR4D · on June 15, 2021

To me, this is simple:

If the pages are not behind a login, then they are PUBLIC, and open to any sort of data gathering.

If the pages are behind a login, then they are PRIVATE, and have all the protections alongside it.

Frankly, this bullcrap about "it's not public, but we want to leverage the usage of the open web as it suits us" to build our monopoly crap is getting old.[0]

[0] - I don't care who the company is, whether Microsoft, Apple, Amazon, Twitter, Facebook, Google, or some no-name company

wly_cdgr · on June 15, 2021

I gotta say, making scraping illegal would be a great mercy towards web developers. It's just the worst

altdataseller · on June 15, 2021

If they do rule HiQ labs unable to scrape, can they also make sure ZoomInfo stops scraping LinkedIn profiles too?

I have seen people’s profiles scraped word for word and posted on ZoomInfo. And sold to salespeople who cold call you. Let put an end to this.

jollybean · on June 15, 2021

You can walk around the store and 'see' whatever you want, but if you start recording prices and taking pictures, they may ask you to leave.

I wonder if there's some relevant precedent there.

ramoq · on June 15, 2021

you have to watch the video of the actual court proceedings. a core issue is that LinkedIn knew hIQ was scraping their website and allowed them to do this until they started launching similar features. hIQ has a strong argument around this point and evidence proving so.

video: https://youtu.be/tvLdJujOp8k

ggggtez · on June 15, 2021

> public

The entire case is just about that one word. If the profile is public, then you are authorized. We already went through this.

CFAA is not a privacy law.

myfavoritedog · on June 15, 2021

For its part, hiQ uses the data for products that analyze employee skills or alert employers when they could be looking for a new job.

Interesting that they've made a business out of this. I've always maintained to my colleagues that when I see a bunch of LinkedIn activity from them, I can tell that they're putting themselves on the job market.

myfavoritedog · on June 15, 2021

Apart from the CFAA issue, it seems strange that you can't regulate access patterns to your own web site in order to prevent mass scraping of user data.