Hacker News new | past | comments | ask | show | jobs | submit login
We Built a Facebook Inspector (themarkup.org)
282 points by atg_abhishek on Jan 6, 2021 | hide | past | favorite | 70 comments



The sample selection / non-response bias highlighted in this write-up is a _Big Idea_ problem I've been thinking about recently:

  Limitations...*Trust in surveys and political leanings:*
  About 95 percent of people contacted for the panel chose not to participate because of lack of trust in having a third-party application installed on their computer or other concerns for privacy.
Think about that - a reputable, privacy-first organization asked people to opt-in to fully consented, voluntary, compensated research and ~95% declined! I can't even imagine what hidden skews are present in the 5% that agreed. This issue is systemic in consumer research and impacts both public (e.g. election polling, U.S. census) and private (pharmaceutical trials, media/advertising research, voluntary AI/ML training daat) polling.

Governments and businesses make biased, potentially discriminatory decisions if a non-random segment of the population chooses to never be counted. The ad industry attempts to circumvent this through non-voluntary passive tracking, which trades off non-response bias with bulldozing user privacy. The headwinds are only growing too, as consumer awareness of privacy lapses and the politicization of polling continues to reduce who participates in opt-in research.

Finding a solution to this that doesn't resort to privacy-eroding tactics is a moonshot level problem in terms of the size-of-the-prize if solved.


An organization people had never heard of asked them to install a browser plugin, and they declined like they should.

Even if it's an organization you recognize, verifying it's not someone using their name for some sort of scam isn't always straightforward.


Right. An alternate take on this passage is, "95% of contacts exercised basic security measures rather than blindly install unvetted code on their machine."

It doesn't take that much of a shift in perspective to turn this into a very optimistic statistic indicating that normal users aren't always quite as security-unconscious as we normally think -- that for whatever reason (security, apathy, paranoia, whatever) sometimes they do actually make the right choices when interacting with sensitive information like their browsing habits.

You could not pay me to install an unvetted Electron application where I can't even see the source code, that is designed to MITM my browsing activity. Even if I trust the author's intention, who wrote the app? Who tested it? How do I know that the automatic redactors are going to actually work? It's not like it's hard to have security leaks in Electron.


But that's exactly the point of the OP: 95% "declined like they should" -- but what does that say about the 5% that didn't? What general conclusions can you draw from data elicited through people that are clearly unlike the mainstream?


The key thing here is trust. Trust is a resource like a rainforest: you can exploit it sustainably, or you can get a far greater profit by destroying it. Trusting people are a resource that can be exploited for fraud, which the internet is great at producing. It's not really surprising that random organizations find low trust.

It is however very unfortunate, as historically being a "high trust society" has been a great advantage of the west. And it's going to take a lot of repairing.


Is it a problem that should be solved? Shouldn't people have the right to live in peace and not be forced to participate in some survey?

I also don't really see the problem with 95% of people declining; they made the smart choice. If I was a regular Facebook user I would also decline because the account would contain tons of very sensitive data such as DMs and running an untrusted, unknown application on my main computer is also a major dealbreaker.


Try applying the problem to other issues to see the impact:

* An advertiser wants to place ads on sites / tv networks that have an audience that is more likely to buy their product upon seeing their ads. If they don't want to violate privacy, they run a survey. What if the response rate among a historical disenfranchised group (e.g. African Americans) is terrible? The modern "data driven" marketer would see little reason to advertise on Black media properties. This isn't a fictious example - it's a current problem in the media planning / agency industry.

* A local government has to decide between investing in more ESL resources in public education vs. other competing budget needs. They look at census / community survey data (which some Hispanic and immigrant populations are fearful of responding to d/t politicization) and decide to prioritize other asks due to undercounted demand. The data could also be skewed in other ways that warp their decision, like allocating budget to school zones that only represent specific immigrant communities that haven't historically been disenfranchised.

The big picture issue here is governments/businesses making decisions with bias information leading to incorrect conclusions, and the only know recourse currently is to scrap privacy.


Look at it from the point of view of regular folks:

* An advertiser - a malicious being intent on tricking me out of my money - wants to make a survey to determine how to make it easier to trick people into parting with their money. Why would I help someone make my life, and life of other people like me, worse?

The answer to that is to beat advertising down until it isn't so blatantly customer-hostile. Then people may be more willing to help.

* I'm in a politically precarious situation and the government is asking questions - ostensibly for purposes that could benefit me, but if my honest answers were seen by a different government agency, it would cause me a world of hurt. I hide away. Or lie.

The answer to that is ideally to fix the politically precarious situation of a subset of your population - but at the very least, to foster the trust in information separation between government agencies, so that I can e.g. afford to be honest with the census bureau without worrying about the IRS or the police. That level of trust is not the default.


You've really summed up the state of the world right now: we're in a crisis of trust. We don't trust each other, we don't trust institutions and the result is anxiety, fear and anger.


I broadly agree, but I would frame it a bit differently: we have a severe lack of trustworthiness in our modern world; or, at least, the trustworthy voices are lost in the noise.

This is a big big part of why I primarily use FOSS as much as possible. Generally speaking, FOSS developers and distributors seem to act with the user's interests in mind more often than proprietary software vendors. (Certainly the distributions do, probably out of necessity - there's no shortage of competitive distro options, so a distro being shady is practically a death sentence. Individual developers still deserve more scrutiny.)

The advertisers certainly do not have my best interests in mind.


Agreed. The issue isn't that people don't blindly trust advertisers and VC-backed companies enough. The issue is that those entities are not trustworthy.

People who are choosing not to share data with those companies in their current form are making a smart choice.


I think you may be underestimating the problem with your framing. The real trouble is outside of software.

If you think about it, how come otherwise reasonable people become anti-vaxxers, or flat-earthers, or believers of any kind of (perhaps less obvious) nonsense? The arguments I've seen tend to boil down to lack of trust. They don't trust healthcare institutions ("it's all bought out by big pharma!"), scientists ("all bought out by big $something!"), government agencies ("they're incompetent"/"literally nazis"), etc.

To some degree, these institutions all violated our trust in one way or another, and media (both mainstream and social) is doing stellar job at amplifying the damage. To me, the problem with the people mistrusting institutions to the extreme isn't the facts - they often have good, if cherry-picked ones. It's the relative weight given to those facts (like, just because there was a screwup with the swine flu vaccine doesn't mean flu vaccines in general are evil dangerous pharma moneymakers). Fixing that requires teaching people some rational thinking, and I'm not sure how to do that; it's much more difficult than just throwing citations at them.


This is an idea that Cory Doctorow has also promoted at various points: that the increase in conspiracy theories are due to the increase in conspiracies, and people just don't know how to tell real conspiracies from fake one.

I agree that his/your position is worth considering, and I don't think it's that far off of the mark, but I also think it's kind of oversimplifying a tiny bit.

I think some people honestly get swept up in conspiracy theories out of pure mistake, but I've also seen people get pulled into conspiracy theories not out of some kind of rational mistake, but because those theories validate something that they want to be true, or because they offer a community that isn't otherwise available, or just because it feels good to think that every problem in the world is some specific person's fault. Jumping from general distrust of the world to full-on conspiracy is... well, it's a jump, not a simple step. I don't think everyone in QAnon is there just because they're not rational enough, I think there are multiple issues at play.

I suspect there is no single unified cause for conspiracy theories that we can point to, even though I do agree with people like Doctorow that actual rampant corruption in our institutions both isn't helping with the problem and is understated as a potential contributing factor.


Fair enough. I think that the community aspect is a competing theory here - or even a complementary one. I've personally (face-to-face) dealt with conspiracy believers that tend to be isolated in their beliefs, but I totally buy that for many, it's the shared belief that matters, almost regardless of what the belief is even about. This also has support of some sociological research I remember reading.

About the Doctorow's idea, I don't know. Do we have increased amount of conspiracies? Or perhaps just a perception of it? Or maybe we're constantly exposed to micro-conspiracies - namely all the businesses, big and small, scheming how to one up each other and screw up their customers - that make people prone to see conspiracies everywhere?


This line of thinking confirms my biases.


> and the only known recourse currently is to scrap privacy.

I agree that low response rates are a problem, but people should still have the choice whether or not to give this information. To me, when I see that voluntary participation in these studies is so low, that's not a problem with privacy, that's a problem with the institutions doing the collection.

A good example of that is political surveys, which are really hard because people don't answer their phones. But why don't people answer their phones? Because they're swamped with scams, political ads, and other spam. Half of the time that someone says they're conducting a political survey on a phone call, what they're really doing is campaigning for a candidate.

The problem isn't that people are allowed to decline phone calls, the problem is that most of the phone calls people get are unwanted crap -- so it really doesn't make sense for them to answer the phone, they're making the correct choice by letting unrecognized numbers go to voicemail.

As a further analogy, if 50% of mail in the US postal service was infested with live spiders, you might see delivery rates for paper bills and official notices plummet. That would be a problem. But the solution wouldn't be to force people to open their mail anyway, it would be to stop putting spiders in people's Amazon boxes. And as it is with spiders, so too it is with advertisers.

You want to improve voluntary participation rates? Focus on removing bad actors and making people feel safe about their data. Governments, telemarketers, political groups, advertisers, and just companies in general all have serious issues with self-policing how they use and collect data. That's not anyone else's fault or problem to solve.


> A good example of that is political surveys, which are really hard because people don't answer their phones. But why don't people answer their phones? Because they're swamped with scams, political ads, and other spam.

But why should people answer political surveys? It's a waste of time similar to the other nuisance calls you mentioned.

Even if we assume all the other nuisance calls are eliminated, there's still no reason anyone should answer a political survey. It's a waste of their time and there is no way to ensure how this data will be used.


It would make elections a bit less stressful. There are a lot of vested interests in both political campaigns and in the public at large that want accurate polling before elections. That's extremely difficult to do right now.

I know some people debate whether having that information is healthy, which I won't comment on, but I do understand why someone might want it.

Now, at an individual level, what do I personally get out of answering any specific survey -- that's a much tougher question for me to answer.


Just do what Nielson does: Pay people for their data.

I don't think it's that difficult; if pay-for-survey skews results toward overvaluing the opinions of the poor and/desperate-for-money, well, then it would be the first time in history.


In my opinion, this is a symptom of weak/ineffective regulation in the personal information space. The consequences for data breaches to the guilty parties have been minimal at best. Meanwhile responsibility for fraud has been pushed onto individuals via concepts like "identity theft". Even if the company in question was indeed reputable and well-known, most people don't have the technical expertise to evaluate any claims about security or privacy. Who would take that risk knowing that at the end of the day most of the consequences will fall on them personally?


Meanwhile fb has all of this info from the 100%...


> a reputable, privacy-first organization...

Are you a shill for them? "Reputable" as mainstream media? "Reputable" as in the fake news is reputable?

Whenever someone claims "I am reputable" you should run away as fast as you can.

This "I am reputable" is purely subjective based on your own biases and incentives.

No one can be reputable in the news space and people must understand this as soon as possible: everyone lies, even natural science which is supposed to be the gold standard gets so many studies wrong.

The only way news organizations stay alive is either by: a) clickbait articles which eventualy devolves into lying or exaggeration at best. b) News orgs that are financed by private people/corporations who have their own agendas.

There is no such thing as a "reputable" news source.

Perhaps one in ten thousand jounralist is still legit, so that he is an actual investigative journalists. But 99.999 of "journalists" are actually just script readers and clickbait writers.

Do you not see where "I am reputable" leads to? Soon the "I am reputable" organization will get political power and then they will make laws based around "my reputable reporting" and this will lead to censortship.

"I am reputable" always leads to censorship down the line since it implies that "my opponent is not reputable and is lying and MUST BE SHUT DOWN IN THE NAME OF DEMOCRACY".


Per my answer above: the situation where nobody can be trusted is horribly unstable, because evidence-based or impartial policymaking or even justice becomes impossible. This tends to result in replacing trust relationships with force relationships, and the society devolves into warlordism or dictatorship in order to restore order and control. You can't expect people to trust an election where all candidates are disreputable, so they vote in a dictator.

This is why the unreliability of news organizations is such a serious problem.


Look at your response. Why would OP be a shill when you’ve responded this way? It’s not hard to see people will have strong opinions on things even if they believe their opinion isn’t wild, like yourself.


An example of what they used the data for is this investigation into feed changes in Georgia ahead of the runoff: https://themarkup.org/citizen-browser/2021/01/05/in-georgia-...

Raw data available here: https://github.com/the-markup/citizen-browser-georgia


Those ad spending values are mind blowing... No wonder Facebook doesn't want to change anything.


It seems media companies have a different interest in political races: if they call it a tight race, the candidates will pay them to run ads. And indirectly, if they report that it's tight, people will keep tuning in, and they can sell these eyeballs for a better price to their advertisers.

For example the 2008 Dem primaries with Obama vs Hillary. Obama was sure to win it months before the Dem conference, but I can recall CNN still calling it a race...


> if they call it a tight race, the candidates will pay them to run ads.

All the spending happens well before news channels start reporting ballot counts. Multi-million dollar political campaigns do not rely on the news media to tell them how they should spend their ad budget over a months-long election cycle.

By the time news orgs start reporting actual ballot counts, it's too late to spend any more money. A good chunk of votes have already been cast via mail. Poll stations have either closed or are at best a few hours away from closing.


What, and the news doesn't report poll numbers of "likely voters" before the election? Who'd be frontrunner, who's surging, etc, etc?


Why would that affect the candidate's spend? Is there any evidence of that happening?


"Facebook made money from showing users ads containing misinformation."


Note that facebook has all this data freely available. They probably run very similar analyses. But they don't act on them, or publish their results. Lack of access to this data is a big problem for social media researchers that needs to be solved.


While I see where you're coming from, I don't really see how this could be addressed without very fat NDAs and a serious risk of leaking personal data. For comparison, you wouldn't expect say Apple to give researches access to their proprietary intellectual property. I very much agree with you that it could be very beneficial, I struggle to construct an argument for why Facebook should do this.


One difference is that Apple generated their own intellectual property. Facebook lays claim on other people's data.


People working at facebook would love to publish this data, and let other researchers take a look...

But the simple fact is that any high profile analysis of this data will simply further fuel debate about facebook overreach and harm facebook's business.


> People working at facebook would love to publish this data, and let other researchers take a look...

Why would they love to do so considering this will end up being detrimental to Facebook's profitability and thus their compensation and/or promotion opportunities?


Because they would love to do it, if it didn't affect their careers detrimentally.

Any study on Facebook data would generate several papers in good publications, great for the more scholarly inclined working at Facebook


Arguably hiding this data until public rage inevitably boils over is worse for Facebook long term


Have they published any results of their study yet?


This is a good project, especially as the subjects are paid.

I'm interested to see what the outcome would be. I'm not sure that advertising is the worst part of FB, I strongly suspect its other users.

I am very interested in the "recommended" findings. I think that for all but a few, they reflect their own world view. However that's a hunch


The national averages for race they posted add up to 116.33%. I wonder where they messed it up?

Edit: They also only have 77.77% for national average age, though that might be explained by those under 18.


The problem might be "Hispanic or Latino" row, which is 16.4%. On (US or state) government forms that's usually a different question from race. See the second image here: https://www.pewresearch.org/fact-tank/2015/06/18/census-cons...


Yep, this is exactly it. "Hispanic or Latino" is a question of ethnicity according to the Census Bureau:

> "People may choose to report more than one race to indicate their racial mixture, such as “American Indian” and “White.” People who identify their origin as Hispanic, Latino, or Spanish may be of any race."

Basically it boils down to "Hispanic" or "Latino" being designations for country of origin, where Hispanic refers to anyone from a Spanish-speaking country, and Latino refers to anyone from Mexico or a Central- or South-American country. They're often used interchangeably in the US, though.


It's an endless rationalization to avoid talking about people who are of American (New World) descent as such. The thing most of the people targeted by the most aggressive anti-immigration rhetoric in the US share is that they are of American descent, and that's not a thing that you say.


Wow. Is that from the current admin? We are one human race. The origin of the word racism is from people who think there is more than one race which is scientifically incorrect.

The question about this should only ask about ethnicity.

Kindly explain your downvotes.


> Wow. Is that from the current admin?... Kindly explain your downvotes.

I did not downvote, but,

1) no, these racial and ethnic classifications have been used for decades

2) Trump is not relevant to Facebook Inspector

3) The philosophy of race and etymology of racism are at best a distraction from the topic at hand, and likely flamebait


I'm certainly not here to waste time advancing flamebait. I was simply shocked to hear that someone considers separating race and ethnicity as standard practice. I don't have anything more to add than that.


"there is more than one race which is scientifically incorrect."

...not quite. There's definitely more than one race ;)


You misquote. I said people who think there is more than one race is the origin of the word racism. There is one human race and many ethnicities.


I don't have to worry about sickle cell disease.


the link is from 2015.


In that link they don't use the word race. The above commenter wrote,

> On (US or state) government forms that's [Hispanic or Latino] usually a different question from race.

I've never seen this (before this census).


Did you fill out the census? See question 8. Lots and lots of US government forms that collect demographic data have a separate question asking if you're Latino/Hispanic.

https://www2.census.gov/programs-surveys/decennial/2020/tech...


The census you link is from the current admin and I was shocked to see it written that way. I don't think it's right or normal at all and I don't agree that this was standard practice prior to this admin.


> I don't think it's right or normal at all and I don't agree that this was standard practice prior to this admin.

It's normal (the basic structure has been used since 1980), though there was a proposal which didn't end up being used to consolidate the race and Hispanic origin questions for 2020.


Wikipedia writes,

> The 2010 US Census included changes designed to more clearly distinguish Hispanic ethnicity as not being a race. That included adding the sentence: "For this census, Hispanic origins are not races."

Government forms should simply ask about ethnicity, not race, and certainly shouldn't be asking both. I don't see why anyone who isn't racist would object to that. You can definitively call someone racist for listing ethnicities as different races.

[1] https://en.m.wikipedia.org/wiki/Race_and_ethnicity_in_the_Un...


> In that link they don't use the word race.

The link specifically describes a (potential) move away from using the word, showing examples where they did and describing some of the history of when it was used. The comment you replied to even told you which specific image in the article to look at for an example...


Above poster suggested it's normal in US forms to separate a question about race and Hispanic ethnicity. I'm saying that isn't normal. The way the census broke this out appears to be subtly advancing the idea that there is more than one human race.


In my experience it's very normal. Every employment application I've ever completed, spanning a couple of decades, has had two optional questions that were phrased in exactly this way.


Is this the same study involving the open source browser plugin that Facebook were up in arms about?


No, but it is mentioned in the article.


Slightly off-topic, but if you're looking to clear your Facebook history, there's an extension for that: https://chrome.google.com/webstore/detail/social-book-post-m...


How did you find my pin-code? (9003)?!


Presumably meant for another front page story:

https://news.ycombinator.com/item?id=25656827 (Simulating Terminator PIN code hacking scene)


This is the facebook inspector, please disclose your nudes....


We get mad at FB for spying and then we spy on FB? Fighting fire with fire?


Opting to relinquish one's data to a principled, accountable, and transparent organization driven by clear objectives and beholden to a strict privacy policy, is very different to what one does when they sign up to Facebook.


Truly baffled to how you think there’s an equivalence here


A company is not a person.


"sousveillance": the watching of the powerful by the less powerful.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: