Hacker News new | past | comments | ask | show | jobs | submit login
Data Brokers (ca.gov)
195 points by troydavis on Jan 18, 2020 | hide | past | favorite | 75 comments



> a “data broker” is in the business of aggregating and selling data about consumers with whom the business does not have a direct relationship

I had to make a few clicks to find the definition, more details here (scroll down to section 1c):

http://leginfo.legislature.ca.gov/faces/billTextClient.xhtml...


Lot of companies still need to register.

* LinkedIn - Sell your data via Sales Navigator

* ZoomInfo - Scrape your email account and sell the data to other businesses - https://www.zoominfo.com/business/about-zoominfo/privacy-pol...

* DiscoverOrg

* Clearbit - Browser extension scrapes your emails for data and they sell it to others via their "free browser-based add-ons, extensions, or plug-ins" - https://clearbit.com/privacy

* Hunter.io - Scrape people data from your browsing activities to resell.

* Demandbase


I don't know about the others, but I'm not sure LinkedIn qualifies. They have a direct relationship because people have profiles that they voluntarily created.

It might be different if they were selling information from shadow profiles that they created from other people sharing their contact lists, but I don't think they are selling that.


I’m not sure that any browser extension (Clearbit in your example) would have to register. It’s pretty clear that you have a “direct relationship” with them when you install their extension.


How about Microsoft? With all the telemetry in Windows 10 and what they know about the end users, the only question is whether or not they sell the data to anyone, I guess?


No, direct relationship


Zoominfo and discoverOrg are the same company now


Facebook is like Sauron and LinkedIn is like Saruman.


The full submission for [spydialer] says:

-----------------------------

> How a consumer may opt out of sale or submit requests under the CCPA:

> Go to spydialer.com, then click the link "Do Not Sell My Info".

-----------------------------

Cool. Now compare that to another one listed [infocore]:

-----------------------------

> How a consumer may opt out of sale or submit requests under the CCPA:

> contact the company

> How a protected individual can demand deletion of information posted online under Gov. Code sections 6208.1(b) or 6254.21(c)(1):

> contact the company

> Additional information about data collecting practices:

> No information provided.

-----------------------------

Ugh. Their contact page is intended for submitting requests to do business with them: https://infocore.com/about/contact-us/


>> Go to spydialer.com, then click the link "Do Not Sell My Info".

> Cool. Now compare that to another one listed [infocore]:

To opt-out of spydialer.com, you go to https://www.spydialer.com/Consumers/, then you have to provide all of the information you think it has on you to verify your identity and selectively remove it from their services.

I'm not really comfortable providing my full name, address (or addresses if you've lived in multiple places), and phone numbers to a Data Broker just to opt-out - that feels quite counter to the idea of them ideally not having my information in the first place.

I feel like I need a global opt-out where a system or agent on my behalf can then handle this for me at each place where data can be collected.


At the very least, when a company acquires my data through any mechanism, it should be forced to email me with a copy of all the data they have on me, and with a one click link to get them to delete the data and permanently opt out.

Any company found in procession of personal data without the appropriate (delivered) email notifications being sent should be fined at minimum $1000 per individual, and criminal charges if intentional.

This won't put anything like services you voluntarily sign up for out of business (as they send sign up confirmations anyway, or should).


This would severely backfire if the data broker thought someone else's email account was associated with your identity. Say:

Danny W[ilkins]

- Job Title

- Salary

- Job application records

- Social media post history

- Email 1: dannyw@gmail.com (correct)

- Email 2: dannywilkins@EmployerLLC.com (correct)

- Email 3: dannywinters@hotmail.com (not correct)

I'd assume they'd send the email containing your salary and post history to the two valid emails as well as the one invalid email.

That seems like a potential nightmare for you, for example records of job searches being sent to your employer email.


Also, there's the situation of people who receive a lot of emails from services where the user mistyped their email, but the service doesn't validate the registered email.

In my case, I always feel I'm fighting a losing battle against the horde of emails addressed to users of unknown services...

There are a growing number of services that validate the email and include a "I didn't register at this service".

This has me worried, because I can imagine so many scenarios where the metadata aggregators scrape and mis-classify by email:

A teenage boy using a Snapchat-like service had created an account with my email and I had to manually delete it; some guy used my email to register his account at a MacDonald's franchise's ERP; I once received a booking confirmation from some large airline (and Google nicely reminded me that I had an upcoming flight); also, I was once wrongfully tagged by email in an unlisted web album of a party that had taken place in a French village by an older lady.

I always try to reach out to notify the person... But sometimes it's hard to call a Peruvian bank to notify them that one of their account holders used my email to register his account, and they tell you they'll get around to notifying the user, but don't.

Now; I could imagine several scenarios where things might get bumpy for me in the near future...

Like in the event where someone tries to forecast criminal behavior; or what if a government thinks I should be paying taxes on some income that their metadata suggests I have (because, why else would you have emails in your inbox from that bank?).


Does that have to be the email account you registered with that business? It definitely is concerning if someone could effortlessly assemble a database of all of this information just by getting control of your email address. Chat logs, Alexa recordings, geo locations going back years, every Uber ride you ever called and the addresses you came and went to.

Even the deletion requests could cause an identity theft nightmare. One hack and all of your Facebook, Apple, Google, Microsoft, Steam/Blizzard/Any game company, identify and digital assets are gone forever with no chance of return.


There’s this odd grey area , it seems Zillow is claiming it’s publicly available data which I believe they are right as the data comes from government lease knowledge. So you don’t own your name and address. If I own a business that wants to use machine learning and guess if you are republican or Democrat based on your house location do you own that guess when I sell it? Is that your data?


Until that company then turns around and sells your data. Looking at you Unroll.me


Err-- I have a successful business that falls under the "data broker" category and requires I register, but I won't.

I won't because they don't tell you what the "fee" is until the Attorney General "reviews" the submission and then assigns a fee to you. Sounds like extortion to me. I am not going to give them all of the leverage and then accept whatever fee they want to levy on me. If they want me to register, have a fixed fee and be up front about it.

I am willing to accept the penalty fee for not registering. At least I know what it is and it's well within the financial capability of my business, so who really cares.

Selling people's data is fine as long as the state gets a cut I guess?


Any business that sells people's data without their knowledge is unethical. You should consider an alternative direction for your life.


Oh really? Yet you probably use Google, Facebook and a ton of other services that do far worse.


The real estate online industry is a significant data broker. Zillow, Realtor, Redfin, AirBnb etc have lead generation platforms that, I guess, fit into the definition of a data broker.


Not a single one of the big ones have registered.

Acxiom, Experian, etc.


"(d) “Data broker” means a business that knowingly collects and sells to third parties the personal information of a consumer with whom the business does not have a direct relationship. “Data broker” does not include any of the following:

(1) A consumer reporting agency to the extent that it is covered by the federal Fair Credit Reporting Act (15 U.S.C. Sec. 1681 et seq.)."

http://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?...


Wouldn't Google Search/Gmail/Twitter -- all need to register themselves as Data Brokers?


See the response by hbcondo714

> a “data broker” is in the business of aggregating and selling data about consumers with whom the business does not have a direct relationship


Google is aggregating search queries. And they are selling my search query to the highest bidder with whom I'm not having a direct relationship. That sounds like a search broker to me.


If you use Google to search (or use Chrome or gmail or...) then you have a direct relationship with Google, meaning Google is not a data broker.


"Existing law, the California Consumer Privacy Act of 2018, beginning January 1, 2020, among other things, grants a consumer a right to request a business to disclose the categories and specific pieces of personal information that it collects about the consumer, the categories of sources from which that information is collected, the business purposes for collecting or selling the information, and the categories of third parties with which the information is shared."

The new law requires companies to tell us who are they sharing our personal information with. Applying common sense law -- this sounds like anyone who sells my personal information to a third party with whom I'm not having a direct relationship - that gets covered.

The Ad I see on the top of my search query seems based on personal information that was sold to a third party.


The ad you see in the search results was put their by google after an advertiser said “advertise to someone who searches for X, and also likes hacker news”. Google wouldn’t share that data to the advertiser. Once you click the ad... that’s another story.


Does clicking on a link to a Google.com URL create a consensual relationship with whoever Google.com redirects to? That seems absurd.


Why else would you click on the link, except to go to the website of the business at the other end?

Certainly visiting the website of a business is a consented action, when you were searching, and then clicked a link, for relevant keywords to that business.


The law has a lot of requirements for every company that collects data about you. It has additional requirements for companies that are data brokers. The quote you provided is about the former. The list linked in this post is about the latter.


Correct. While this law certainly enables you to send “nightmare letters” and generally make yourself a PIA for Google and all other companies, it does not require them to register as a data broker because of the consumer’s direct relationship with them.


Skimming the bill is frustrating (readability doesn't seem to be a strong suite with govt sites) and a question comes to mind: does this basically only apply to California businesses, or any entity who has consumer data on California residents, or...?

I'm all for it, just wish that tidbit was more clear I guess.

Or maybe I'm just not reading it correctly, who knows.


California has no jurisdiction outside of California, period. If you are not based in California and do not deal with the data of Californians, then you are not subject to it. If either of these things are true though, you probably are subject to it.

The problem of course is that you may not know that you are dealing with a Californian’s data. For example, you may have scraped an email address and have no idea who it belongs to. I suspect that for this reason, I got a bunch of emails from affiliate programs recently saying that they had to close because of this law. Example:

“Unfortunately, the Yelp For Business Owners program is being paused immediately due to 2020 California Consumer Privacy Act (CCPA) concerns. As of the sending of this notification we have expired all affiliates, which will be effective 1/13/20.

As you may be aware, the newly enacted California Consumer Privacy Act has placed new requirements on Yelp. While we evaluate the implications of these requirements, we are taking the step of temporarily pausing our affiliate program. Thank you for all of your hard work in 2019 and we?re looking forward to working more closely with all of you in 2020 once we work past this issue.”


Asking as a non-American.

This is basically to prevent atrocities like Equifax, right?


The law[1] specifically excludes " A consumer reporting agency to the extent that it is covered by the federal Fair Credit Reporting Act (15 U.S.C. Sec. 1681 et seq.)."

[1] http://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?...


So what pains does this legislation solve?

To me it seemed outrageous that a company I had no dealings with, accumulated data on me without my consent, and above all - their data was breached.

I thought that's what this legislation was trying to address.


Blacking out credit history would obliterate lending, which would in turn collapse asset prices and lead to a complete restructuring of entire industries and society itself. If that were on the table, you would’ve heard.

This is primarily about the companies that buy your phone number from your gym membership or supermarket loyalty card and sell it to telemarketers. They won’t be missed.


> Blacking out credit history would obliterate lending

Credit history companies like Equifax are very much an American thing; lending still works everywhere else.

They are such a non-topic in France that I only learned about them last year, and the French Wikipedia page on the subject is very short, and only documents the US, Canada, and China: https://fr.wikipedia.org/wiki/%C3%89valuation_des_risques-cl...

And IANAL, but it seems to me that credit history databases are made illegal by Article 5 of the 1978 "Computing and freedoms act" https://www.legifrance.gouv.fr/affichTexte.do?cidTexte=JORFT...


A quick Google search suggests that the Bank of France itself tracks defaults.

As I understand it, the American weirdness is that this is a private sector function rather than a service of the state. Not that it exists.


Yes, and it only tracks defaults; and you're notified 30 days before being added to the file. And you are removed from the file as soon as you repaid what you owed.

Source: https://www.service-public.fr/particuliers/vosdroits/F17608

That's nowhere near the amount of data Equifax collects.


I don't think society would need to be restructured.

Many countries do fine without private third party entities managing people's credit history and score. Instead, you have to provide data on your credit worthiness each time you request credit from a particular bank. Such as bills, income proof, assets etc.


In which countries can you default on a loan and then go get another one like it never happened? Sure it might be the state hosting the mechanism instead of a private company, but that mechanism is pretty important.


Yeah, if you default on it and there's a court decision etc. then it gets recorded in a government system and you are on a blacklist.

It's quite different and a lot less fine-grained than feeding all your credit-related and non-credit-related history into some ML model to estimate your credit worthiness.


Sure, and nationalizing or limiting the scope of credit reporting are both fine ideas.

It is still a third party collecting a fact about someone and repeating it to others, to the data subject's detriment and against his will.


It provides a regulatory moat for experian, equifax and axciom. And it makes it sound like the legislature has done something useful.


It is. But as a credit bureau, Equifax was already subject to some data protection regulations from the federal government - that's a large part of why they could even get in trouble in the first place. Due to details of constitutional structure in the US, California's general data protection law can't legally be applied to companies that are already subject to a federal data protection law.


What about businesses that aggregate api data, if you use 2-3 data brokers are you a data broker? Seems to me only the source should register but the i think theres a lack of definition there. So every user who uses a api with consumer demographics and sells data should register as a data broker?


Data brokers are middlemen. Not the source (no direct relationship to the data subject), and not the consumer (derive revenue from selling data).

If the process of selling data involves a daisy-chain of three middlemen companies, they're all data brokers.


If you collect data from third-party APIs and then re-sell them, you are a data broker. If you only use the data, then you aren't.


> California law requires a data broker to register with the Attorney General [...] on or before January 31 following each year in which a business meets the definition of a data broker.

Confused, so this means you won't know who is a data broker this year until next year?


This list seems like a good place for hackers to start. These brokers likely don't have the infosec teams that Google or Facebook have.


That's why they should be tightly regulated and directly responsible for incidents involving data leaks à la Equifax


Is there a condensed version of the bill somewhere for lazy people like me?

I am looking to know -

- What the legal obligations of a data broker are.

- What are the legal obligations of sites/services that use them on their pages and enable them to get user data


According to 1798.99.80(d), a data broker is defined as "a business that knowingly collects and sells to third parties the personal information of a consumer with whom the business does not have a direct relationship."

The bill explicitly says the following entities are not considered data brokers: consumer reporting agencies, financial institutions, and insurance companies. (I'll note that this is a summary; the bill states more specifically those entities covered. However, the categories are generally correct.)

So, to the questions.

> What are the legal obligations of a data broker?

According to 1798.99.82(b), the obligations are two:

1) The data broker must register with the Attorney General and pay the annual registration fee.

2) The data broker must provide the state with the name of the data broker; its primary physical, email, and internet website addresses; and any additional information or explanation the data broker chooses to provide concerning its data collection practices.

> What are the legal obligations of sites/services that use them on their pages and enable them to get user data?

The bill does not directly state any obligations. In my read, the key part of the definition of a data broker lies in the lack of a direct relationship to a consumer. A business can still sell customer information to a data broker, but I believe this would then fall under the purview of CCPA (which appears to be corroborated by the final line of this bill.) This bill seems targeted toward those who solely acquire information through other, indirect collection means.

My read of this bill (I'm not a lawyer) tells me the state understands the present value of data brokers and doesn't want to eliminate the industry with crushing regulations. However, we know there's plenty of corruption, greed, and lack of ethics among data brokers. Requiring entities to publicly declare their brokering of data seems like a reasonable way for government to reduce these issues.

Consider a restaurant which operates without a license. A license is good because the city knows of the existence of the restaurant. If the city was not aware of the existence of the restaurant, they could not, say, reliably send in health inspectors. Restaurant cleanliness is clearly a good thing since it reduces the potential for food-borne illnesses. I can envision similar analogous benefits from licensing data brokers.

----

Sources:

http://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?...


It’s so depressing...I spend so much time opting out of unwanted services and data sharing but I feel like I’m losing this battle.


I see six entries. Where's Google, Facebook, etc?


Google and Facebook typically work by allowing advertisers to pick customer traits, and then Google/Facebook shows ads based on their knowledge of who fits those traits. It looks like the definition of sale[1] means companies must actually transfer the consumers info for them to count as a data broker. This means Google and Facebook likely don't have to register.

Their lobbyists did their jobs, I suppose.

[1] https://leginfo.legislature.ca.gov/faces/codes_displaySectio....


Google and Facebook don't sell user data. It's much more profitable to keep it for themselves than to sell it to others.


CCPA's definition of selling data is both expansive, and a little vague.

Google Ads (formerly AdWords) now has an API so that customers can disable possibly-sales-like features on a per-use basis if someone clicks "Do not sell my information." So, while Google doesn't sell users information, they're only willing to go to bat and say "this is definitely not a Sale Of Information under CCPA" for a subset of the services they offer.

https://privacy.google.com/businesses/rdp/

Facebook I dunno. My sense is they have a larger appetite for testing the boundaries of the law.


Yikes, this reminds me of the nebulousness of GDPR. I guess the intent much like GDPR is to write a law too vague to follow and then fine the crap out of companies.


I would say they're technically not data brokers because they collect data directly from an agreed upon relationship with a consumer, and they do not sell the data itself but rather productize it for advertisers.


> they collect data directly from an agreed upon relationship with a consumer

Facebook builds shadow profiles. Those involve no direct relationship with the surveilled.


But there's no evidence that the information they gather on non-users is shared with third parties, even in indirect ways like the ad-targeting information. (I don't necessarily reject the term "shadow profile", but it's important to remember that it's an activist term; there's no reason to expect that they have similar formats or uses to the non-shadow profiles.)


Presumably they get the data for the shadow profiler from tracking scripts and stuff that run in your browser, so technically that’s a relationship you accepted when you ran their script and allowed it to phone home.


Even if that was true,

> they do not sell the data itself

Selling seems like an important part of brokering


> I see six entries. Where's Google, Facebook, etc?

The deadline for registration looks like it's January 31st, hopefully after that date California will start going after those who do not comply.


It will be interesting tko see who registers and who doesn't.


One straightforward solution is to grant each individual copyright protection over all data that identifies them.

Each person own their own data, and can sue to enforce those rights.


More useless regulation instead of trying to improve the state. No wonder people are leaving here in droves.


So there's only 6?


Hopefully this is a non-controversial (register only) beginning to regulation.


They should add a column for cost per record /s


where's safeway?


I approve of this outcome




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: