Hacker News new | past | comments | ask | show | jobs | submit login
Facebook bans researchers who were investigating Facebook ads (dailydot.com)
435 points by jimgordon on Aug 4, 2021 | hide | past | favorite | 77 comments



(I don't use the product. Or maybe I should say, I'm not a Facebook product)

>Facebook moved to penalize the researchers in part to remain in compliance with a 2019 data privacy agreement with the Federal Trade Commission, in which the company was punished for failing to police how data was collected by outside developers, Clark said. Facebook was fined a record $5 billion as part of a settlement with regulators.

That's a reasonable explanation for their action, right? Even if it's not 100% true, if you're a manager at Facebook, and the situation isn't 100% clear cut, you're still going to act, so you can show it as an example the next time the FTC tries to fine them.

And they did offer an alternative:

>Clark said Facebook offers targeting data sets for political ads, and has suggested the NYU group use that information.


I'm not familiar with this research field but I would be concerned about what's the price tag came with that offer, and whether accepting that dataset is indeed in align with the research purpose, e.g. does it contain the relevant info, is it comprehensive, how could researchers prove that it's not manipulated / is the model used in production / is not biased towards FB's interest.

I believe it is reasonable for FB to react to scraping under the current circumstances. But it seems equally reasonable for the researchers to go with the data collection approach they chose.


The FB dataset drops an unmeasured/unreported number of ads that the researchers believe could be significant. It also lacks any demographic information, which is part of the study (participants consent via a browser extension).


Seems like regardless of what course of action you take someone could write an unfavorable article about it.


Cambridge Analytica got their data by obtaining consent from users to install their app on the FB platform, claiming that it was for academic use. The difference here is that this group is using a browser extension, but in both cases the methodology is to collect data from Facebook’s API on behalf of the user.

Facebook also goes after other browser extensions that scrape data, not just researchers: https://www.zdnet.com/article/facebook-sues-two-chrome-exten...


> their app on the FB platform ... this group is using a browser extension

and that is the gist - is your browser a part of the FB platform or not (the situation of ATT network and the telephone and even just the phone book). FB behaves like it is and like as a result they have control over it. And them getting their way means we ultimately lose very important area of general computing. That has been already happening as unapproved ways of calling web APIs have been met with responses from C&D all the way to criminal prosecution.


> unapproved ways of calling web APIs have been met with responses from C&D all the way to criminal prosecution

That's insane. Why even have a web API if I can't make HTTP requests to it? Why do I need "approval" to do what their own javascript does?


This is a reasonable question, but if the API is authenticated, you may have made some agreements to get authorization to use it. Criminal prosecution could be overkill, but that’s not what is happening here.

It’s kind of like asking “what’s the point of letting me into your warehouse if you didn’t want me to take the compressor that was in there?” but you signed a contract not to take the compressor before they gave you the key to the warehouse.


It's more like: why can't I send an agent to get what I need from the warehouse instead of going myself?

I should be able to create my own user agent for a website if I want. I should be able to replace their web application and non-free javascript with my own software.


You are totally free to do that.

FB is totally free to ban you..

I mean they're a gussied up advertiser, of course if you mess with their "engagement" or bottom line they'll ban you. It's like the "rubber hose crypto breaker" XKCD.


If they ban me, they'd probably be doing me a favor. Actually sending people legal bullshit over this? This is some sort of legally-sanctioned bullying.


The debate on who has control over data typically creates two parties: the individual user who it is related to, and the corporation providing the platform or product.

We ought to add another party: the public. Perhaps data should be able to be used for the public good, and we should be able to participate in deciding what data is collected and how data is used.

In this case, having data about what ads are seen, by whom, and why they see those ads, seems like it could lead to us better understanding how FB and other companies algorithms are segmenting the population and how certain ideas proliferate within those segments. This seems beneficial to the public, since as we've seen over the last 5 years or so, these platforms and the way they choose what information we see can have drastic effects on the economy, politics, etc. If I had a choice, I would choose to continue collecting data about the behavior of advertisers and the platforms that serve them for that kind of analysis. But we don't have a choice, because facebook "owns" that data.

A number of people in this thread have referenced Cambridge Analytica. When Facebook does choose to share data with other parties, we have no say over what data is shared with them or what they may do with it. We don't even get a choice in how Facebook internally uses our data. Instead of democratizing the decision of what data is collected and how it is used, the FTC applied the rules of private property and fined FB for lack of privacy.

The public got nothing out of that situation. Facebook now is more defensive of their ownership over our data, which also precludes us using it for the public good.


Excellent comment.

Regulation of broadcast was justified on the basis that the airwaves were a scarce and shared public resource. In looking at a coherent, pragmatic, and equitable basis for regulating online content and surveillance, the notion of a common public good and interest might be a good anchor.

In discussion, the notion that public awareness, attention, mindshare, and understanding are themselves a common good ... gets to some interesting (and yes, scary) places.

Interests in privacy, concerns over widespread or highly targeted manipulation, and similar concerns could possibly form the basis of coherent limits on tracking, surveillance, and "information sharing" on individuals and groups. Open, transparent, and ethically guided research ("who decides" being university and professional ethics review boards, as is presently largely the case in human-subjects research) could be excepted, but would require those components.


Let's not forget that Aleksandr Kogan - the guy who harvested the Cambridge Analytica data was a research associate at Cambridge. Can facebook trust all the researchers at NYU? Can't one of them just leak and sell the scraped data? There are no guarantees that the scraped data will be used for just academic purposes. Facebook probably doesn't want another data-leak fiasco.


Facebook has no business in dictating what plugins the users would like to have on their browsers. Sure they can ban scraping because reasons, but both this or the Cambridge Analytica case are not data-leak unless we are assuming that a user's personal data, contents they generated and their social relationship status are all Facebook's property.

Acadamic use vs commercial use is a separate topic too imho.


Well yes, but they can dictate whether you can use their service with these add-ons (or at all, for that matter).

And irrespective of this opinion, CA backfired spectacularly on them, so it's not totally unreasonable for them to enforce that right.


I agree with you that they have the right to deny service, and scraping at scale is not same as regular queries. Suggesting FB's react was meant to protect (not theirs) data is what I feel not about right, though.


Users should decide who gets to use their data, not Facebook.

And there's no guarantees with any data. Facebook itself can't be trusted to not have leaks. Two years ago, data from 500m profiles was leaked, Zuckerberg's own Facebook id, mobile phone number, and other information.


There's a fallacy of composition and awareness here.

Individual users:

1. May not be aware of how data are being used. (In fact this is a virtual certainty.)

2. Don't appreciate the immense power of data in aggregate. (Something that is close to Facebook's key commercial advantage.)

3. May be exposing data on other users, who are not participating and/or don't consent to particupate in such data hoovering.

I'd argue that Facebook can also make exceptions, and that good-faith, well-reviewed research projects, particularly those aimed at independently assessing manipulation and propaganda efforts on the platform, are a case I'd strongly recommend. But to say that Facebook has no right or obligation to decide is false on its face.


Keep in mind that when evaluating a research proposal, Facebook will have zero interest in evaluating the "good faith"-ness, or "well-reviewed"-ness of the proposal. And to be fair they are probably not qualified to do that, and would have no incentive to become qualified.

As a business, making a business decision, they'll want know "can this come back to bite us" (and they will miss many of the ways that might happen), and, how much will this benefit us either in money or in facilitating new ways of making money with the new information.


See lilactown's excellent comment here: https://news.ycombinator.com/item?id=28064953

My reply to that addresses some of your concerns as well.

TL;DR: the call is not entirely Facebook's to make. Perhaps not at all.


>"There are no guarantees that the scraped data will be used for just academic purposes"

The whole point in the project is to make the scraped data available to anyone and everyone who is interested. They publish this data via a public database. This is articulated very clearly at the top of the Ad Observatory project page.

"Ad Observer is a tool you add to your Web browser. It copies the ads you see on Facebook and YouTube, so anyone can see them in our public database."[1]

The Ad Observatory project collects the following:

"What we collect

The advertiser's name and disclosure string.

The ad's text, image, and link.

The information Facebook provides about how the ad was targeted.

When the ad was shown to you.

Your browser language."

Additionally the code for the browser plugin is up on github[2]. How much more transparent could they be?

[1] https://adobserver.org/

[2] https://github.com/CybersecurityForDemocracy/social-media-co...


I like how people change Alex to Aleksandr when they’re trying to make a point about the big scary Russians, despite Alex being a fully American citizen.

Of course it’s much easier to blame muh Russia than it is to blame Facebook, who created the platform and by definition set its boundaries. They literally gave all the information to Aleksandr. All he did was read their documentation and query their API endpoints as designed and officially documented.

He literally followed Facebook’s instructions to get the data they offered to him. And yet here you are using weirdly ethnic overtones to denigrate him as some evil hacker that victimized Facebook by pilfering some nebulous “private” information that Facebook worked so hard to protect.


You're reaching a bit. His Wikipedia page lists his name as "Aleksandr Kogan". The OP didn't mention Russia at all, nor did he mention anything about him being evil. It's true that Facebook instructions to get the data - that was the whole point. The app users were installing never said that the data would be used in the way it ended up being used.


It might be a reach to ascribe malice to OP in this case. But the pattern persists, and the fact he’s known more as Aleksandr than Alex is more the result of agenda pushing in the “reliable sources” than of a common policy to use full names. I’m sure you will find many instances of the same sources using nicknames for people they like.

It’s the same reason Fox News says “Alexandria Ocasio-Cortez” instead of the more common “AOC.”


His page on the Cambridge website, as well as his Twitter account, also use "Aleksandr". So far, there are multiple examples shared on this thread of him being referred to as Aleksandr and none of "Alex". Do you have _any_ source indicating that he prefers to go by the latter, let alone enough to conclude that there's a racist conspiracy afoot?

> It’s the same reason Fox News says “Alexandria Ocasio-Cortez” instead of the more common “AOC.”

Wouldn't you expect a news organization to use the full name for a politician instead of a colloquial term, regardless of how they feel about them? They don't say RBG either for Ruth Bader Ginsburg, despite the fact that Internet conversations use it heavily.

I know this is a radical viewpoint these days, but it turns out that not literally everything is about race.


> To aid in their research, the group created a browser plug-in called Ad Observer, which collects data on the political ads users see and why they were targeted for the ad.

Understandable that any entity that bypasses their API/consent flows and collects a user's data would be a big no, regardless of what their intended use is.


Browser plug-ins I install on my browser are not FBs or any other companies business. I consent when I installed it and gave it the permissions.


You're also granting the extension access to your friends' data, given that it can see everything that you can. Your friends consented to show that data to you, but not to the extension developer. Your friends' consent is not transitive.


Still not FBs business. If I copy my friend private photos they would not care either. They are not representing my friend interest neither legally nor in any other way.


No, Fb is doing the correct thing in this case. If I upload data to Fb, I expect Fb not to allow others to scrape it. I gave pictures/info to Fb, not to you because you had a browser extension installed.


I'm not entirely sure I agree that Facebook is in the right here - analyzing the ads that are shown to you is different from downloading photos you uploaded.

However, your principle argument that there's a difference between Facebook serving pictures to friends and massive, automated serving of pictures to bots and scrapers. I understand that there's no good way to differentiate, and that the bits that are sent over the network are the same regardless of who is consuming them, and that my friends have the technical capability to upload the images elsewhere.

But just as you get different outcomes between one situation with an individual policeman watching traffic, pulling over reckless vehicles or tailing a suspect vehicle with a known license plate and another compared to a network of automated license-plate readers and speed cameras tracking the city-wide movement of lawful and criminal people alike, you get different outcomes when you differentiate between bots and live users.


You realize that is completely unworkable, and anyone can take a picture of the screen of the photos that you upload and share them. Don't post anything on facebook that you don't want the whole world to know.


There’s a large, large difference between your friend taking a screenshot and your friend authorizing a third party to any content they themselves can see. Scale and automation matters a lot.


You should be more worried about facebook doing that.


I agree with the sentiment, but the line between read-and-record permission vs read-and-brun permission is really vague here. Scraping is exercising the right to read is the right to download in an automated manner on behalf of a user. Scraping bans (not limited to FB) are more of a commercial practice rather than caring about user consent/privacy imo.

Personally I don't think access should be given based on assumption of the query's intention (daily browsing vs scraping data for analysis authorized by someone), but like r/w/x and user group, i.e. if you can view, then you can view and record. Otherwise either no access granted, or burn after read.


How is this any different than your friend saving the photo you posted and then showing it to her co-workers (or whoever else)? Are you are arguing that Facebook should disallow copying/downloading of any content on its network?



lol you can expect whatever form FB but they dont care. And if you gave access to the data to other people FB cant and wont do anything to prevent thous people from accessing the data and if access is possible scraping is too.


Is this an argument about what an extension can do or what this extension does? It appears not to read any data on your friends at all.


No. That's incorrect. The data they were gathering was advertiser data, not your friend's data.


Read again what the extension does. You're so far off base.


> However, as Protocol noted in March, the information collected from accounts that did not “consent to the collection” that Clark appears to be referring to was actually advertisers’ accounts, not private users.


I don't think we should expect Facebook to go digging into every scraping extension to see if the scraping is done to obtain user data vs. advertiser data (stuff Facebook is also trying to protect, mind you). Especially with the amount of obfuscation and extensions they likely need to deal with.


Facebook already must have dug into the extension because they complained about what it does. This is a strange defence.


So Facebook's users rather than their raw materials.


> Understandable that any entity that bypasses their API/consent flows and collects a user's data would be a big no, regardless of what their intended use is.

Generally I agree and understand FB's position. But isn't all of the data from Ad Observer publicly available?


    why they were targeted for the ad.
How is that determined exactly? Does the clientside download details like that, for what purpose?


According to the next paragraph, "Facebook provided about how the ad was targeted, and when the ad was shown to a user, among other things.".

Haven't used FB in ages but I assume it's like "why am I seeing this" hint in youtube recommendations (which usually says "because you watched video x").


There is a little link on each ad, when clicked on, would tell you why you are targeted


The principle of we will track and follow you and all your friends and contacts, but you cannot peek into our matters.


Who watches the watchers?


Congress supposedly, however it strikes me as if congress is blind on the matter, metaphorically speaking.


"Facebook defended the action Wednesday, saying: "We repeatedly explained our privacy concerns to NYU, but their researchers ultimately chose not to address them and instead resumed scraping people's data and ads from our platform," a spokesman said."


Mmmmh, i can imagine that scraping all that info could later used in a destructive way.

>Facebook defended the action Wednesday, saying: "We repeatedly explained our privacy concerns to NYU, but their researchers ultimately chose not to address them and instead resumed scraping people's data and ads from our platform," a spokesman said.


> However, as Protocol noted in March, the information collected from accounts that did not “consent to the collection” that Clark appears to be referring to was actually advertisers’ accounts, not private users.


Facebook is a platform capable of controlling large numbers of people through subtle manipulation with techniques and at scale not previously possible. There have been instances where it seems that capability is being tested (whether by Facebook itself or groups leveraging its platform). It's imperative people research this legally and ethically, which is what is was being done by the NYU group.


> legally and ethically

The ethical complaint is that they accessed data without consent:

> "Facebook defended the action Wednesday, saying: "We repeatedly explained our privacy concerns to NYU, but their researchers ultimately chose not to address them and instead resumed scraping people's data and ads from our platform," a spokesman said."


My read is they are talking about an advertisers right to privacy of the ad. This is simply not a thing. The advertisement is unsolicited communication, and belongs to the user receiving it. Otherwise it sounded like they were collecting no user data other than indirectly based on the metadata used by the advertiser to target the user that already agreed to it.


Ah, this appears to be correct. Thanks for pointing this out, I read the article a little hastily.


I see all these NGOs , nonprofits, the media, and academics as filling in a sort of role that the govt is unable/unwilling to do, by trying to hold such companeis accountable. What the government does is quite limited despite how much it spends,and with the exception of egregious violations like blatant fraud or discrimination, tends to take a hands-off approach.


A topless club can kick people out for touching the dancers. You break the rules of a private club and they can kick you out. Under the guise of research people weakened Linux. Neither party in this situation has clean hands.


The researchers make the data public. This isn't like Cambridge Analytica.

There shouldn't be a worry about what the researchers are going to do with the data because they make it public.


I guess the main question here is how to make it technically impossible for companies like Facebook to detect browser extensions like the one used here, so that it becomes impossible for them to restrict users based on that.


This somehow reminds me of what happened on Freenode a few weeks ago. They started to ban everybody who posed any amount of threat to the platform (real or illusory) which ended up killing the platform.


This already got the attention of a U.S Senator.[1] Another argument for regulating Facebook's ad business.

[1] https://www.reuters.com/technology/us-lawmaker-says-facebook...


The comparisons to Cambridge Analytica that I see here aren't correct. They got fined for being wilful about granting developers a lot of access to data, and I understand they're scared but this isn't being done in good faith, there is a difference in approach in how this should be done correctly vs incorrectly

Thanks for the downvotes appreciate it


> there is a difference in approach in how this should be done correctly vs incorrectly

I imagine you're getting downvotes because you need to expand on this.

What specifically is NYU doing correctly that Cambridge Analytica was doing incorrectly?


Because no personal information gets stored with NYU, it's just simply not correct to equate the two situations


The real issue is Facebook preventing research into vaccine disinformation:

> “By suspending our accounts, Facebook has effectively ended all this work. Facebook has also effectively cut off access to more than two dozen other researchers and journalists who get access to Facebook data through our project, including our work measuring vaccine misinformation with the Virality Project and many other partners who rely on our data. The work our team does to make data about disinformation on Facebook transparent is vital to a healthy internet and a healthy democracy.”


[flagged]


Usually all they have is the empty, evasive “well it’s their platform and they can do what they want” tactic.


They could stop sending out (ad) data any time so no one could collect the data. Its their platform they can do whatever they want.


Half the time this is a sarcastic way of saying that all legislative, and legal options are the on the table because popular opinion and market forces are unsuitable for forcing companies like Facebook to change.


They should create their own social platform and monitor that. /s


Everyone here told me Facebook is a private company, they can do what they want.

So, what's the issue?


The US government should just follow in China's footsteps and shut down Big Tech or nationalize them. Nobody with half a soul would complain.


Ha facebook can't handle people monitoring them and collecting data eh?

Facebook just submitted this ad to me. I don't know why, they "trust" me. Dumb idiots.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: