Hacker News new | past | comments | ask | show | jobs | submit login
Think You’re Discreet Online? Think Again (nytimes.com)
218 points by tysone on April 22, 2019 | hide | past | favorite | 133 comments



Theres another downstream issue of this that many people seem to be unaware of.

I've gone down the road of trying to create a super private browsing experience.(www.privacytools.io for a good start).

The issue is the web experience degrades rapdily and quite frequently is unusable. You have to block CDNs for a start if you really wanna go full paranoia mode. And there's too many other issues to list here.

But the biggest issue, is that Google's CAPTCHAs become unsolveable. I mean literally they wont let you pass. You will get served a large sequence of captchas. I've experienced up to 6 for a single verification. And not only do you get more, but they deliberately add an excruciatingly slow delay to each image. It can take 10-15 seconds to solve a single image set, and then you still need to do 5 more.

Also interesting side note, it seems they are injecting adversial noise into the hard captcha, as I noticed faint distortions in most of the images.

Bur even with your best effort to solve them you will not get through. Ive repeated this experiement many times. It seems that preventing browser fingerprinting is the thing that really makes them put up a redflag, but its hard to know for sure. And I'd like to emphasize that none of the privacy modifications I was using make any significant change to normal browser functionality. You still have JS, you still have cookies (through Firefox containers).

Anyway the scary thing here is its very easy to extrapolate this further and get to an internet where you either opt in to be tracked, or you are effectively throttled/gated on huge parts of the web.

Google CAPTCHAs have gone from asking the question "Are you human?" to "Which human are you?".

And the beautiful, incredible irony is how this contrasts with Google's very public stance on net neutrality.

"Internet companies, innovative startups, and millions of internet users depend on these common-sense protections that prevent blocking or throttling of internet traffic, segmenting the internet into paid fast lanes and slow lanes, and other discriminatory practices. Thanks in part to net neutrality, the open internet has grown to become an unrivaled source of choice, competition, innovation, free expression, and opportunity. And it should stay that way. "

Do as I say. Not as I do.


Google's ReCaptcha has done a lot of harm to the Internet and is constantly used when it completely unnecessary (viewing a page? Submitting feedback when you're already logged into an account that you already did a captcha to make or login with?).

>Google CAPTCHAs have gone from asking the question "Are you human?" to "Which human are you?".

This is a good way to put it. I wonder if the Googlers that work on ReCaptcha are somehow able to convince themselves they're Making The World A Better Place, like most Googlers somehow manage to.


Hand on my heart: I've been in a very secure (national security) facility and witnessed someone struggling with Google's captcha thing for hours. They were trying to make a change in a system used for soldier's flights. 200+ Accounts needed a single number swapped on a form. 200+ google captures to solve. Nothing is as frustratingly pathetic as someone in uniform, a trained soldier, screaming about whether "traffic light" includes both sides of the hanging light.

Passwords, access cards, biometrics, hardened doors, guards carrying machine guns, rooms without numbers, and on the inside of all that: google captcha.


That's puzzling.

Were they accessing a secure government site that was serving CAPTCHAs?

Or were they accessing a public site from a government facility with browsers that were so locked down that they triggered CAPTCHAs?

The first would be kinda stupid. The second would be funny. Or tragic, I suppose.


Speaking as someone with some military experience, almost certainly the second. Military computers are locked down excessively, often in ways that are actually counter to real security. Policy is reactionary, and is all-too-often simultaneously too narrow and too broad (in orthogonal ways) which ends up impacting usability while not really adding any security.


Wasnt a military site. It was a civilian service's site, not a public site like travelocity, but one biult for use by government agencies.


What is a good alternative? I'm going to be looking I to our login system soon and have been meaning to read about the options


If there is no way for users to spam other users then there is no need for a captcha at all. Just add rate limiting and alerts to see if one user is using too much resources. If your users can disturb each other than the best solution is to make your system invite only and remove invite perms from anyone who is inviting bots or alternatively have some way where users must add other users as friends before they communicate using an external service so for example someone wants to add a friend they send an email with a friend code then you are using the existing anti spam system of whatever external system the user picks.

Maybe none of these are convenient for you but for some websites they work quite well. Invite only communities often have far less trolls and virtually no spam.


Even if this is true, the problem is that Google reCAPTCHA is way easier to slap on anything, and for the few hurdles it imposes it is very effective at stopping bots.

I'm not in a position to decide on a human verification mechanism, but I used to do WordPress and Drupal CMS work as a freelancer. The number of non-technical people setting up their own sites on these platforms vastly outnumbers the people who even are aware of some of the downsides to Google reCAPTCHA. Until the mechanisms you described implementing are as easy as installing a plug-in, Google reCAPTCHA is here to stay.


github has a good one

you can test it by trying to create a new account

its dead simple and looks to be pretty secure

you have to orient an object so that its not upside down

https://octocaptcha.com


EA use a variant of this for their FIFA app and it was legitimately the most frustrating CAPTCHA I've ever tried to solve. They overlaid several objects at the same time in an attempt to make it harder and it took me 9 minutes to solve them all. Example: https://imgur.com/a/x8x4amL


This is definitely easier. While I don't mind reCAPTCHA on many resources, commercial entities which charge money for you to use their products, should try to make the login process as easy as possible and reCAPTCHA isn't one of them, as at times you have to iterate several times of identifying all cross walks or store fronts.


Are there a fixed number of objects?


I'm more worried about the small state space. Even if there are many possible images, randomly guessing lets in 1/8 of the bots. Is there something to protect against this?


i found this but i dont know if it answers your question

https://blog.jscrambler.com/preventing-automated-abuse-with-...


In my experience, email verification gets you 99% of the way there.

Strategically-placed honeypot form fields get you to 99.5%.

I run a site that gets ~1.2 million visitors a month. YMMV.


Not sure you need to make sure the user is human on a log-in form. Just limit the number of attempts per day per account, perhaps with email verification to pass it.


At even small to moderate scales you will end up with legitimate users being locked out of their accounts en masse with this approach.


Well, most log-in forms do not have any captcha, so I wonder how they do it.


Meanwhile in developing nation's captcha work from home schemes are becoming quite common. Where for a few pounds day/dollars/euros a worker at home will identify 1000's of captcha. From my understanding it's popular work as mum's and those at home can do while caring for children.


I could be wrong, but I assume that Google mostly does this because of the huge amount of malicious garbage bot traffic on the web. Once you go fully anonymous and private, there's no way to distinguish you between the hordes of malicious traffic.

But there seems like there should be a technological solution. Google knows what a "legitimate" user looks like. Either because they come from a safe IP, or they're registered Google users, or they have a normal established browser history. But using your legitimate identify compromises your privacy, because then your IP/user-account can be correlated to your personal identity.

So, let's create zero-knowledge proof identity schemes. Using your legitimate, but non-private, identity you "register" an ephemeral, private anonymous identity. But it's done using zero-knowledge proof, such that Google has no idea about who a particular anonymous identity besides that it must belong to one of its several million legitimate identities.


>I could be wrong, but I assume that Google mostly does this because of the huge amount of malicious garbage bot traffic on the web. Once you go fully anonymous and private, there's no way to distinguish you between the hordes of malicious traffic.

What's "malicious" about simply opening a page on a website? Google bots are doing this all the time and they don't have to solve CAPTCHAs.


If your site is at all worth scraping, it will be scraped, republished and monetized by someone other than you.

Google plays the long game though. By setting themselves up for being the gatekeeper of bot traffic, any bot belonging to a competing search engine cannot possibly index the web as well as Google can.


I assume Google Recaptcha has a way to recognize its own Google Bots.


Isn’t that kind of an antitrust thing?

How does a good, non-Google not solve a Google Recaptcha?


> Isn’t that kind of an antitrust thing?

> How does a good, non-Google not solve a Google Recaptcha?

I'm not sure if it's antitrust, because I would imagine that website admins would _like_ Google to have access to their site while excluding others.

What people fail to realize, I think, is that if this practice had been common, something like Google would have never came to be in the first place! Imagine if websites blocked all but AskJeeves user agents from crawling them?


Just to be clear, I meant to say "a good, non-Google bot".


I would love for something like that to happen, and given all of the activity in the blockchain space, someone will probably come up with a clever solution. I just have a hard time imagining a motivation for Google to switch to using such a thing even if it were invented. They way it works now benefits them.

But even just looking at it from a purely technical perspective... If CAPTCHAs are intended to be "completely automated public Turing test to tell computers and humans apart", then its quite clear that they no longer accomplish this. I am a human and I can't pass. The model is wrong, build a better one.


>> If CAPTCHAs are intended to be "completely automated public Turing test

That's what google says the are for, but they aren't used that way by everyone. Many websites use them not to detect machines but to slow down the humans. They want to make something burdensome so they add the captcha. Maybe they want you to subscribe, pay money, to avoid it. Maybe they just want you to spend a few more seconds looking at of some banner ad. Or maybe including a captcha ticks a useless security box on some compliance form.


> If CAPTCHAs are intended to be "completely automated public Turing test to tell computers and humans apart", then its quite clear that they no longer accomplish this. I am a human and I can't pass. The model is wrong, build a better one.

Google’s CAPTCHA probably knows that you are a human, some of your answers are correct, and Google already uses the answers you give to improve its Maps offering like it always has. The fact that it still won’t let you pass is presumably a way of punishing you for trying to stay anonymous.


There are not. They are here to save money.


Although I provide no link to them. I believe I've read numerous articles and studies that even with generic anonymized data that many have been successful in deanonymozing the user behind the data.

Depending what they data is. It might take more points and samples but I think most people would be surprised how few data points you need before you can be identified.

Even from the most mundane things and especially if they have non anonymized data to compare or find you agaisnt


It somewhat exists: https://blog.cloudflare.com/cloudflare-supports-privacy-pass...

Google doesn’t implement it directly, but any CloudFlare powered website that shows you a Google Captcha because of CF is supported.


A resource for anyone interested in zero-knowledge cryptography: https://zkp.science/


>But there seems like there should be a technological solution.

Even if there was (there isn't) I wouldn't waste a single second thinking about or advocating for it. The only sustainable "solution" can come from Congress and it would look almost exactly like GDPR.


> But the biggest issue, is that Google's CAPTCHAs become unsolveable. I mean literally they wont let you pass. You will get served a large sequence of captchas. I've experienced up to 6 for a single verification. And not only do you get more, but they deliberately add an excruciatingly slow delay to each image. It can take 10-15 seconds to solve a single image set, and then you still need to do 5 more.

All of which is an obnoxious form of unpaid labor to train their AIs.


Add: It's nicely clever, and it should be appreciated if it advances technology as a whole by enabling self-driving cars etc. sooner, but I can't help but feel offended whenever going through its "hard mode."


I think at the absolute worst I've spent almost 10 minutes painstakingly walking through successive google CAPTCHAs before just giving up. The modern web is basically openly hostile to anonymity (or pseudo anonymity). That's one of those genies I don't see any way of putting back in the bottle : the privacy conscious consumer is just too small to matter at any kind of scale. They end up relegated to a niche set of tools and services (which ironically enough presents a neatly encapsulated attack surface for any malicious actors who might be interested in targeting these users).

Ultimately it may all be moot : we're slowly moving towards the point - still years or decades off, admittedly - when effective de-anonymization of the public internet will become a reality. For better or worse, I guess.


I had to use my own computer programs in order to be able to solve Google's CAPTCHAS!!!

Machines solve them much better than I do, way faster and way better.

And the most important thing, it does not frustrate or mentally exhaust me, like the thing does to people.


haha here's one better -- i am using the GOOGLE'S voice to text API to break their own audio captcha!!! so when people say that captcha is about stopping bots, they are wrong. as some of the top comments point out, from google's POV captcha is about enforcing personal identification and training AI.


There's also outsourced services that pay people to solve captchas all day, Mechanical Turk style. At this point I think they're a moral necessity, given that recaptcha will often flatly refuse to allow accessibility options for some users.


The google captcha's are starting to be unsolvable for me.


A shopping website is one thing, but the california dmv website https://dmv.ca.gov logs you into google when you use it, and google captcha is required to make an appointment, otherwise you get a cryptic "server unavailable" message.


OK, so Google CAPTCHAs using Tor can be almost impossible.

But using nested VPN chains, in a VM with a basic Firefox install with NoScript and an ad-blocker, I see hardly any CAPTCHAs. Sure, everything that I do is tracked and correlated. But it's just linked to a VPN exit IP, and a made-up persona. I actually go out of my way to link everything in this VM to Mirimir.


Just to clarify, I wasn't talking about using TOR in my post. I found a certain configuration of Firefox config settings and Add-ons which will fail every CAPTCHA on any site. This is without a VPN, from my home IP address, which I found bizarre as that should be more than enough to identify me uniquely.

And your solution is good. But its kind of ridiculous that we have to jump through hoops like that to stay private.


> I found a certain configuration of Firefox config settings and Add-ons which will fail every CAPTCHA on any site.

Could you share more specifics, please? Reading your comment, that really didn't sink in for me. And I wonder what you're blocking that Tor browser doesn't. And/or how Tor browser reduces CAPTCHA incidence.

It is ridiculous. But for me it's been a hobby. For over 20 years now ;)


Use Canvasblocker and enable all the settings.

The internet punishes anonymous browsing. Every site out there wants to track you. Across websites! Just imagine ordering a Big Mac at the McDrive. Suddenly a McDonald's employee walks out and puts a GPS tracker under your car. How would you respond?


Maybe try Canvas Defender instead. It serves a random canvas, and maybe that doesn't trigger so much.

Within a few years, I wouldn't be too surprised if they'll be printing GPS trackers on packaging ;)


> And not only do you get more, but they deliberately add an excruciatingly slow delay to each image. It can take 10-15 seconds to solve a single image set, and then you still need to do 5 more.

There's a GreaseMonkey script that eliminates this delay.

https://greasyfork.org/en/scripts/382039-speed-up-google-cap...


There's this paper that highlights the techniques (supposedly) including the fingerprinting: https://www.blackhat.com/docs/asia-16/materials/asia-16-Siva...


How about a browser / service that makes you pseudonymous instead?

This would probably be hard to do with canvas-based browser fingerprinting, but might work with the various other fields. Basically create a large pool of 'profiles' based on common browser settings, then assign one at random to clients.

Basically, make it look like thousands of users on the same machine.


>Google CAPTCHAs have gone from asking the question "Are you human?" to "Which human are you?".

Nailed it.


I'd love to see a CAPTCHA for identifying a business/feature in Mapillary imagery which can then be later verified and added to OpenStreetMap!


It is a real PITA when attempting to post to HN via AT&T LTE, same with clicking on any FB link from Google search results.


Huh? HN uses CAPTCHAs?


> But the biggest issue, is that Google's CAPTCHAs become unsolveable.

i've noticed the same... vicious.


Ironically I opened this is private mode and was turned away. The New York Times is trying to have it both ways. I can log in and be tracked by their advertisers or I can be discreet and open articles from another source. I don't mind paying a small subscription but let me choose to deny your advertisers information on how I consume your content. If being tracked is part and parcel of being a subscriber then these pro-privacy articles look almost unethical.


> The New York Times is trying to have it both ways.

No, you need to think of the NYT (and any reputable newspaper) as being two separate entities: the newsroom and the advertising department, with a firewall between them [1]. You really don't want the newsroom killing a story because it makes the practices of the advertising department look bad.

[1] https://en.wikipedia.org/wiki/Chinese_wall#Journalism

Also, demands that the people who expose bad privacy practices have perfect privacy records themselves is to demand for privacy advocates to kill themselves a circular firing squad [2], so no one wins but the advertisers and privacy-invaders. Personally, I want these exposés carried by the publications with the greatest numbers of readers, and I'm not going to gripe too much about practices of those publications as long as the message gets out.

[2] https://en.wikipedia.org/w/Circular_firing_squad


You're right; however, people (NYT subscribers and the general public) have the right to press the advertising department regarding the information and viewpoint presented by the newsroom.

If the advertising department is the entity responsible for the private-browsing situation, when a reader says "The New York Times" in this context, it should be read as "The New York Times' advertising department".

Ideally, the public editor (which the NYT no longer has) would do so, prompted by readers.


> being two separate entities

They aren't really though, they both make money on my privacy with relatively little consent. If that's the bad thing, I'm not sure I care that one of two parties says "sorry, I know this is bad" while still doing it. This could actually make them the worse of the two parties.

> [1]. You really don't want the newsroom killing a story because it makes the practices of the advertising department look bad.

this feels like bait, no one is asking for the newsroom to kill the story, they are asking for the newsroom to kill the practices of the advertising department.

> Also, demands that the people who expose bad privacy practices have perfect privacy records themselves

Its not a demand that they have perfect privacy records. Its kinda just pointing out that these authors have control of who they publish for and that its quite hollow to warn people about a poisonous medium in a way that draws more people through that medium.

This isn't to say they are bad people, just that they aren't great for pointing out bad practices they take part in 10 years after society has already baked them in.

I do warn that my opinion is colored by a belief that news orgs are responsible for a large part the of the normalization of our lack of privacy and current relationship with marketing. Which I see as sort of proto you-gotta-beleive-me methods.

[edit for formatting]


>> being two separate entities

> They aren't really though, they both make money on my privacy with relatively little consent.

They are and they aren't: they're two separate parts of one business. The advertising department makes money for the owners, while the newsroom writes the stories (and is hopefully insulated enough from the interests of the owners and ad department that it can be honest).

> this feels like bait, no one is asking for the newsroom to kill the story, they are asking for the newsroom to kill the practices of the advertising department.

The newsroom doesn't really have that authority. The firewall is there to protect the newsroom from the advertising department, because the ad department is naturally more powerful (due to newspapers being businesses and the ad department being the part that actually collects much of the revenue).

> Its kinda just pointing out that these authors have control of who they publish for and that its quite hollow to warn people about a poisonous medium in a way that draws more people through that medium.

The world is more complicated than that. It's more poisonous to distract from the warnings just to point out they weren't published in some low-reach niche publication whose privacy practices satisfy some random internet commenter.

If you actually care about privacy, you should celebrate these articles because they might reach a wide-enough audience to actually cause real action to fix the problems.


>The newsroom doesn't really have that authority.

They could refuse to publish their articles for a corporation with such an "immoral" advertising department, but they don't, because they aren't genuine in their beliefs.


> They could refuse to publish their articles for a corporation with such an "immoral" advertising department, but they don't, because they aren't genuine in their beliefs.

That's an un-nuanced and extremely uncharitable statement. Have you ever heard the saying "choose your battles?" Don't you think people have to prioritize the actions they take to support their multifarious beliefs? You can have genuine beliefs without being destructively unreasonable.

I'm pretty sure the people in the New York Times newsroom value public good of having a functioning "fourth estate" [1] over having a tracker-free nytimes.com website, and thus are unwilling wage some destructive pyrrhic war with their employers over something as trifling as the latter. Especially when they can instead write a widely read series of articles that bring light to those practices, and perhaps lead to wider change.

[1] https://en.wikipedia.org/wiki/Fourth_Estate


But the fourth estate existed before the internet super powered predatory ad practices. I get that predatory advertisement was baked in from the beginning for news papers and that a lot of work has been done to mitigate those roots, but I feel your pushing a false dichotomy as nuance here. Its not an all or nothing we need to tare down the system and no one can ever advertise or write news papers again problem. Its a "this system is becoming more toxic, lets hope the people who built that system will help us transition away from that toxicity" combined with a "damn I wish it wasn't so difficult to get someone to understand something, when their salary depends upon them not understanding it" problem.

I would suggest

> the New York Times newsroom value public good of having a functioning "fourth estate"

AND it being well funded

AND that they are a part of it both individually(authors) and as an organization (NYT brand)

> over having a tracker-free nytimes.com website

or tracker-free alternatives/competition.


> but I feel your pushing a false dichotomy as nuance here.

No, not really. The GGP was basically pushing "we could destroy the village in order to save it" logic." The NYT is one of the few newspapers that may be able to weather the economic maelstrom that journalism is in the middle of, so it makes no sense for its journalists to go to war with its management over a niche issue, to satisfy a few strident people in the internet peanut gallery. I think it's pretty obvious that the turmoil the GGP's idea would cause would have far more negatives than positives.

Running all these stories definitely has more positives than negatives.

> AND it being well funded

> AND that they are a part of it both individually(authors) and as an organization (NYT brand)

Those are things you need for a functioning forth estate. Some people used to think that blogs (e.g. sites that lack the things you listed) could replace newspapers, but they were wrong.


>> refuse to publish their articles for a corporation with such an "immoral" advertising department,

>> GGP was basically pushing "we could destroy the village in order to save it" logic."

Why do I feel any criticism of reporters reads that way to you? I don't think they want to destroy reporters/the 4th estate at all it seems like what they want is the content creators to try and work for "moral" people and use the power they have as content creators to do so. This is not hugely simple and i agree with you on that. The false dichotomy your pushing into it is that any change which is hard is equivalent to destruction, or that any change that is fought for is someone going to "war" with the 4th estate.

The 4th estate existed before they rebuilt their village on sand. We would rather a solid foundation over them complaining about the sand they built the village on, while also claiming that attempts at change are just not feasible. Complaining about a thing and then saying but its okay as long as i get mine is exactly the thing about it seeming hollow.

>> Those are things you need for a functioning forth estate. Some people used to think that blogs (e.g. sites that lack the things you listed) could replace newspapers, but they were wrong.

I mean the fourth estate is not defined as being any individual reporter or brand. So this is a claim on your part that no reporter should lose or leave or protest their job and no paper should ever go defunct if we have a functioning fourth estate... I don't even really know how to address this claim, but it seems like its probably not what you mean? The point of adding those two things was to draw attention to the incentives of the individuals.

As for blogs, they also generally have trackers, the reason they couldn't replace newspapers was not the lack of trackers on them and people don't go to the new york times to see the ads.

It really feels like your painting a picture of this all or nothing situation with no individuals in it. while this person

> I don't mind paying a small subscription but let me choose to deny your advertisers information on how I consume your content.

seems to want to pay reporters instead of being tracked and this other one

> They could refuse to publish their articles for a corporation with such an "immoral" advertising department, but they don't, because they aren't genuine in their beliefs.

seems to be a "stop claiming your so good please" situation as opposed to the "destroy them all muhhaha" thing your claiming.

> niche issue, few strident people, peanut gallery, pyrrhic war, low-reach niche publication, some random internet commenter.

Common now, really? we are good enough for them to write a bunch of articles about our issue just not good enough to actually try and do anything? And we can't point out the hollowness of that position?

I guess I'm sorry I responded, I felt your original comment was informative but of two side channels that weren't really a response to the content of the comment you where responding to. That is to say while both things you stated in your OP where true neither seems to contradict the idea that the NYT wants it both ways, they actually seem to explain the method by which they achieve having it both ways. I hoped we could get to a better shared understanding of the positions but now I feel like your seeing my and the others arguments as "burn it to the ground" as opposed to "Its nice that they started talking about it being bad a little more frequently, it would be nicer if they chose not to do it, or at least tried to support some alternatives".


> The advertising department makes money for the owners,

I may be missing something here but they make money for the reporters as well right? I don't know, if the firewall involves the money as well, my opinion of the authors would change mildly as the benefits are less obvious/direct making it a easier mistake to fall into though I think the criticism stands either way.

> news department .... hopefully insulated enough from...

> to protect the newsroom from the advertising department

Right and I get that in most cases this works out. In this case the dynamic seems to insulate the marketing department actions from the meaningful critique given by the news room, and the news room from the moral/ethical implications of benefiting from those actions. This moral/ethical insulation is the thing I am questioning.

> The newsroom doesn't really have that authority.

Thats why the Chinese firewall thing felt like bait, no one wants the newsroom to kill the story, and the news room as a corporate entity can't kill ads. The people creating the content though, the individuals... they could start producing content for people who don't do this sort of stuff.

I mean I'm not even saying they should boycott the benefits and winfall of the system that every one around them is deeply embedded in already. I'm saying that as content creators, who are now aware of the situation, they are close to the only people who could populate a different system with content and users.

> It's more poisonous to distract from the warnings

This is not what is going on, the warning stands, people who are mad at the reporters for reporting this stuff like its a new hot scoop are not mad because they want to have no privacy... they seem to be mad because its been yelled about for years. By both reporters and security experts, and the trend has only accelerated. At least that's what im mad about. Its not a "your bad" but a "stop claiming your so good please"

> celebrate these articles

I do, and have since they started coming out years and years ago (I even try and local archive them)... but I celebrate the article, and remember that the author could do better. This is why I see the comment about wanting it "both ways" as valid. It sounds harsh but its not a claim that the reporters are the evil ones and shouldn't talk about privacy, its only the acknowledgment that they could take more steps not to be enablers and that they could actually bootstrap an alternative. It's not really even a claim about this specific author or mainstream outlet.

> actually cause real action to fix the problems.

I don't think they will, I think they will just keep doing what they are doing (and have always done) then participate in the change (if it ever happens), and claim its what they wanted from the start, which brings me back to the idea of wanting it both ways.


That's an idealized but incorrect version of the news industry.

We know that news companies work with PR firms to publish ads as articles. The "suit" PR that is famous here.

http://paulgraham.com/submarine.html

We also know that news companies sometimes kill or refuse to cover sensitive stories that might reflect poorly on their benefactors or their politics. We know that the news companies attack social media as being toxic while demanding preferential treatment on the platforms they say are evil. I don't think we should view anything industry or government with rose colored glasses. The forces of money and corruption are felt just as much in the news industry as it is in every other industry and government.

But I agree with you that as long as the message gets out, it is a net benefit to society. It is better The NYTimes exposes privacy concerns even though they themselves violate their customers privacy. We don't live in an ideal world so we should hold everyone to an unattainable ideal standard.


So another pathological company with a personality disorder?


Organizations are not people. Characterizing then as such can be very misleading and lead to wildly inaccurate conclusions about why they behave the way they do and what they are likely to do in the future.


Yes! And the same applies to countries as well.


The legal concept of corporate personhood kind of says otherwise


Corporate personhood looks like a legal fiction of convenience to me, making contracts, accounting, and responsibilities/obligations etc. easier. Corporations can’t be called up for jury duty or conscripted into military service, they can’t vote, and they don’t need to exist for 18 years before being allowed to take out a line of credit.


The legal concept of corporate personhood demonstrates the issue.


The law can be whatever it is, but that has no bearing on the nature of an organization outside of incentivizing certain kinds of behavior.

A business is an organization that needs to make money to survive. Such an organization will find ways to make money, or cease to exist. No amount of law will change the nature of this dynamic.


Works fine for me incognito with ads and tracker blocking extensions enabled.

Keep in mind that the journalist who did the research and wrote the article did not have a say in how the employer who enabled them to write this piece generates revenue.


> Keep in mind that the journalist who did the research and wrote the article did not have a say in how the employer who enabled them to write this piece generates revenue.

In a way, she did. She is not an NYT employee, but rather, a professor at UNC Chapel Hill who could have published this piece elsewhere.

To be clear, I don't fault her for her choice at all. Just pointing out that we all "opt-in" to this system when we participate in it, and that's a part of the problem.


I do agree with you, though I think opting out of the practices of most websites is impractical. The utility of my usage of the Internet far outweighs some advertisers figuring out what I like to buy. We do need more robust advertising privacy laws, though.

By not being a NYT employee she has even less of a say in the matter. I'm sure she knows that nobody's going to read her article if it's published in The Daily Tar Heel.

If someone gave you the opportunity to publish an article in The New York Times would you say no?

I should also point out that the print edition of the NYT is still widely circulated, and it seems this article would have appeared in the Opinion section, tracker free :-)


This is taking a bit too far, but if someone offered you a fully-paid trip to the Caribbean* would you say no?

* with money from dubious origin


> Just pointing out that we all "opt-in" to this system when we participate in it, and that's a part of the problem.

This really is a significant point.

All-visitor paywalls already crumbled, and a few of the worst excesses of tracking at least getting debated, so this is a place where users can get actual traction. It's the same sort of collective action problem as voting, true; writers and publishers have far more influence than any given reader. But it's also true that the space is far more open than voting.

There are a lot of news sites (e.g. The Boston Globe) which have shut out incognito access, filled their sites with trackers, and loaded up on dark patterns to push people towards misleadingly-priced subscriptions. And for almost everything they publish, there's minimal cost to just... not reading it. If it's a major story, it will be covered elsewhere. I know I'm not changing the world when I skip their links, but I'm protecting a bit of my own data, and putting a bit of pressure on them to do better.


NYT is still A/B testing this feature. It appears to be enabled an increasingly higher percentage of page loads over the past few weeks.


Incognito does not prevent Google from tracking you. They claim that they do not but it is not a standard for private browsing.


This is an opinion piece, not an article. Zeynep Tufekci is a professor at UNC and does not work for the New York Times.


Works fine on safari private mode with content blockers enabled


or firefox + umatrix


you're not using private mode to avoid advertisers. You're using it to avoid the paywall.


What I don't understand is with all of this tracking, ML profiling... Why don't I get ads that are relevant to me at all? I'm very active all across the internet, proactively rate ads as "not relevant to me", would actually be interested in learning about products that I might like. Have disposable income... Yet nothing, just a sea of hot garbage ads that might as well be noise. I can't recall ever seeing something relevant enough to click on.

I think the effectiveness of all of this stuff is way overrated. I have money, I will spend it, but with all the tech in the world the industry still can't put a relevant ad in front of me.


Yeah, the classic example is serving me with ads for the thing I just bought (which won’t need replacing for >10 years). But then advertising is in general very wasteful. Slightly improving efficiency is something people will pay for it seems.

It’s been said in other comments, but a lot of advertising is for products which are useless or unpalatable or unhealthy. Just look at how much alcoholic drinks and soda drinks advertise. Maybe we should tax advertising revenue.


I am mostly in the same situation (except Facebook has _maybe_ found me something I would be interested in buying), for all their information, for all their smart people, etc, they have 1 product...

My guess is that it is because most ads suck, because a) 90% of all products are crap, and b) good products don't need that many ads.

If somebody comes up with a better whatever, I will probably hear about it, but most new products are not revolutionary enough for that, but they are also not revolutionary enough for me to be interested in them as an ad.


The thing that gets me is I have a bunch of hobbies that I enjoy but don't really have time to dive super deep into. They aren't anything weird and there are many companies servicing their markets. I'm sure I have very little awareness of a bunch of things I'd like to buy for these.

If I was being shown ads for products around my hobbies I would:

1. Definitely discover companies and products I wasn't aware of.

2. Buy stuff I probably didn't need but would be tempted into owning.


Hi there, I'm an engineer who works at [INSERT FANG] creating models of user behavior and systems that use those models to drive [INSERT METRIC].

The key here is to further define "a sufficiently high degree of accuracy." Sufficiently high degree of accuracy to... do what? Predictions from these models are typically only sufficient to move [INSERT METRIC] in the aggregate, there is no need to be more accurate than that.

For your case: most of these models are noisy and operate over sparse data. That means any individual prediction can be arbitrarily bad, even to the point of complete randomness. The models only need to be marginally more accurate in aggregate than the a priori average prediction for each item without conditioning on any user features.

Your intuition is correct. "Computer algorithms ... can now infer ... your moods, your political beliefs, your sexual orientation and your health" in the context of this article is hyperbole.

For instance, to predict moods you need to source ground truth variables. Users don't normally give this information out for free. Even in the case the user updated their status with "My current mood is x" there is bias you have to account for that is likely prohibitive to the task. People self censor their emotions on the internet. This means sourcing the data in some other way, maybe through a questionnaire. Either way you won't have nearly as much data and the resulting model will likely be much noisier. Modeling the other user features mentioned have similar pitfalls.


That's simply because YOU are not the CUSTOMER for ads.

The thing about advertisers of dreck is that they don't mind paying to annoy one million uninterested people to get one new customer.


Maybe you're not any companies target audience?


That's doubtful as I don't live some kind of hermit like existence and regularly buy many products.


Maybe you should start; it would be a good A/B test.


Laymen try to be careful online in completely unproductive ways: they keep logging in and out of Facebook without even deleting cookies, they disable data transmission on their phones when at home, etc.

The only way to keep privacy online is to compartmentalize. Use one proxy and profile browser for discussing politics, another for HN/Reddit, yet another for LinkedIn. And always block everything you don't need. Start with ads since you virtually never need ads. Use various nicknames and avoid Facebook if you can swing it.

Beyond that you only need a secure (up to date) OS, Signal, and maybe occasionally Tor. It may or may not hold up against an NSA-level adversary but you will easily lose most advertisers and corporate surveillance.


That's a good start. But you're still vulnerable if you do everything in one OS. And no matter how you physically compartmentalize, you're still vulnerable if you're posting all of that as the same persona. Even if it's not you're real name. Because it can all be correlated easily.

And even using multiple personas, you must take care to avoid linking them. No common social media, forums or mail lists. No cross-linking, or mentioning each other. No telling anyone about links between them. And as much as possible, no shared interests, especially specific ones.

Instead of using one OS with multiple browsers, it's better to compartmentalize in multiple VMs. Each VM should reach the Internet through a different nested chain of VPNs. And for more anonymity, you can use Whonix for Tor. Also, it goes without saying that the machine should be full-disk encrypted, and that you should avoid Windows.

When it really matters, you should use multiple physical machines, on separate LANs. And of course, be careful not to share USB drives among them.


How does this work when you can be individually identified by analysis of the things you write in comments or your mouse movements? And that can be correlated across HN, Reddit, LinkedIn. I suspect that almost all of us are the laymen you first described, to someone.


HN doesn't correlate mouse movement tremors with LinkedIn so it holds up very well against actual online threats today. It may or may not hold up against scifi threats of tomorrow.


> HN doesn't correlate mouse movement

The "fingerprint" of the writing style, the interests and "likes" and "dislikes" patterns is often enough, the more one writes the more "uniqueness" can be determined.


Laws controlling the use of data, like those for credit reporting agencies, would help. So would laws on collection and sale of data. Tracker-blockers help a bit more, but they're an eternal battle.

What we need is new social norms. Most people don't point video cameras at their neighbors' windows, not out of fear of being caught, but because it's just wrong. Similar norms should apply online, and people and companies who do it should be shunned.


I think we're not really there yet. I think it's entirely possible for you to avoid most of this type of tracking if you so choose, but it does mean not using services such as Facebook, Instagram, Whatsapp and not giving out your phone number to people who use these services.

What bothers me much more is that it might be possible to identify somebody through text analysis.


Other articles in this NYT series describe how location data is used for similar types of inference. Many of these systems are nearly impossible to opt out of, since they rely on deep rooted tech, like cell carrier chipsets, credit purchase history (tied to retail addresses), or even psychographic & demographic profiles tied to people in your neighborhood (essentially block level aggregates of commute & purchase data). Your strategy of spending time generating fake data online is not enough to really opt out of these systems, and many would argue that the burden of generating consistent dummy personas is already immense. To really opt out of your data being sold or used to target you, you would need to be ready to not travel with a cell phone, not use any electronic payments like credit/debit cards, or even conceal your place of residence from public sources through complex legal obfuscation. Obviously there are different levels of obscurity that can be had, but much of this technology uses dragnets with few practical ways to entirely hide.


> not giving out your phone number to people who use these services

Come on! Are you even serious? It basically requires you being a hermit, because these people is fucking everyone. I think I'm pretty close to being as discreet as possible while being considered more or less normal, but it seems to be of no use — I'm not even always sure how fresh-new fake account on Facebook (or such) can imply this or that connection, or how youtube can recommend a video on some quite specific interest I might have shown a while ago on a completely different device.

And I don't know about you, but I'm not feeling ready to refuse to interact with a girl/friend/relative because they are using whatsapp. I can only hate fb and wish Zuckerberg a painful death, but I don't see how I can realistically avoid being monitored.


Deliberate obfuscation. Change the way you talk over time. Change your users and passwords. Even go so far as to say things that aren't the slightest bit true but can't be differentiated one way or another. Create a matrix of plausible personas you could possess and never let on to which sum of them you really are.


Wouldn’t it be better to shape the world in such a way that those techniques aren’t needed? Don’t work for nasty companies, don’t invest in them, don’t do business with them and lobby the government to outlaw tracking & stalking.


> ... shape the world ...

For the first time we probably have the technology that could help realize something like this if we knew what the shape should look.

Unfortunately we will never have that because that would mean the shaping had already been done.

Then again, if you think about it, maybe the shaping has been done and the shapers already have what they want...

> ... tracking and stalking

I think governments have a vested interest in being able to track


I don't do that and the companies still exist. You can't stop that so this is the only choice :/


I don't think that's going to work in the long run. You can't stop technological progress.


You can’t stop progress, but you can stop it being used for certain purposes, and laws mostly work as long as they’re enforced correctly.


This is not at all viable in real life, unless you have overwhelming motivation to do so, it's your one priority in life, and you are willing to lose out on a lot of things to do so.


People are bad at stuff like this.


> I think it's entirely possible for you to avoid most of this type of tracking if you so choose, but it does mean not using services such as Facebook,...

Why do you think this is a choice? This is definitely not a choice any more. It has been 10+ years since Facebook removed your choice to not use it from society.

Step outside and walk around. You will be photographed. Those photographs are going to Instagram and you are in the background. Your face is scanned by Facebook and they know where you go and what you do in the day. Facebook will know your location because other apps that you use will send Facebook your location without you even knowing it. Other people's actions online, who are tracked more vigorously, like emailing you or even meeting you in person for a coffee, will give further info to Facebook. Your cell signal location and your credit card history are probably up for sale and Facebook probably buys that too.

You want to opt-out by staying home? Just order stuff online? I'm sure Facebook will still find a way to figure you out. There is no such thing as not-using-Facebook, or opting out any more. Saying that there is a way to choose not to be tracked is dangerous and wrong.

> ...and not giving out your phone number to people who use these services.

The tracking they can do is so far past just linking phone numbers though. Trying to keep a phone number secret simply isn't going to work and if it did, it would be ineffective in protecting you from being tracked.


I've had an idea for a long time, what would happen if you had some silent tabs running alongside you, randomly opening and closing browser pages, clicking etc - acting like a human but with the purpose to create noise. It would also involve submitting articles and social media posts using your name, to make it harder to be found on search engines with any real clarity. If anyone knows of a project like this please let me know of it. I'm aware it would break all kinds of ToS but all is not fair in this privacy game.



Thank you! There is only one feature that keeps me on Chrome (I've moved most of my browsing to Firefox and Opera), and that's the built-in translation as I'm on a lot of non-english websites during the day. This will make my daily Chrome use a lot more enjoyable.


The built-in translation works on Chromium and might work on some of the privacy oriented Chromium based browsers as well. I normally use Firefox and use Chromium when I want this feature.


I tested a few Chromium browsers today with privacy features and the translation feature is disabled. I will try raw Chromium - I'm curious how buggy it is to run it straight off dev.

Epic: translate is disabled Iridium: 403 on using inline translate Chromium: 403 on using inline translate

API key errors abound. After wasting way too much time I went back to Opera and finally found inline translation with this extension: https://chrome.google.com/webstore/detail/google-translate/a...

Now everything is well. Thanks for your tip however!


Every article/report about online privacy reminds me of how many people I know ignore these facts, don’t know them, or brush them off as they ‘don’t find it harmful’/‘have nothing to hide’. Can anyone give me some suggestions/point to a resource on how to present these facts to the not-tech people I care most about in a way they will actually care?


From the article:

>Such tools are already being marketed for use in hiring employees, for detecting shoppers’ moods and predicting criminal behavior.

Surely, it's just a matter of time before we start gaming systems like this?

You already have online reputation management for products and services. So how long is it before someone offers shaped online profiles for individuals as a service?


its almost now.


"But today’s technology works at a far higher level. Consider an example involving Facebook. In 2017, the newspaper The Australian published an article, based on a leaked document from Facebook, revealing that the company had told advertisers that it could predict when younger users, including teenagers, were feeling “insecure,” “worthless” or otherwise in need of a “confidence boost.” Facebook was apparently able to draw these inferences by monitoring photos, posts and other social media data."

Ironically, that's exactly what the NYT has started doing recently![0] I think they've realized just how lucrative selling these interferential insights can be; regardless of how accurate these metrics are, advertisers eat them up. How long until NYT writers start receiving pressure from upper management to produce content that maximizes emotional output from their readers? It's possible that proper restraint and separation of departments can be maintained, but I think history has shown that when profit motives are involved, greed often usurps ethics.

[0] https://investors.nytco.com/press/press-releases/press-relea...


> [Facebook] had told advertisers that it could predict when younger users, including teenagers, were feeling “insecure,” “worthless” or otherwise in need of a “confidence boost.”

> Ironically, that's exactly what the NYT has started doing recently!

No, that's not what the NYT is doing at all, and it's disinformation to say that it is. Facebook was inferring the emotional state of its users, the NYT was inferring the emotional content of its article content [1]. Those are very, very different things.

[1] From your link: "The result was an artificial intelligence model that predicts emotional response to any content The Times publishes... Perspective targeting allows advertisers to target their media against content predicted to evoke reader sentiments like self-confidence or adventurousness." (emphasis mine)


> Based on the machine learning powering Project Feels, nytDEMO has launched perspective targeting as a new ad product. Perspective targeting allows advertisers to target their media against content predicted to evoke reader sentiments like self-confidence or adventurousness.

> nytDEMO will also soon launch Readerscope, an AI-driven data insights tool that summarizes what The Times audience is reading using anonymized data to visualize who is interested in which topics and where they are.

> Readerscope can be used as a content strategy tool to develop creative ideas for branded content or campaigns by searching a brand’s target audience segment (e.g., millennial women) to understand what they're reading, either as topics or as representative articles exemplifying those topics. It can also help brands find the right audience or geography for a certain message by searching a topic (e.g., human rights, philanthropy or travel ) and seeing which audience segments over-index for interest about that subject. Topics are algorithmically learned from The New York Times article archive using state of the art natural language processing, and all of the reader segments are targetable with media on NYTimes.com.


> content predicted to evoke reader sentiments like self-confidence or adventurousness

I'm not convinced that is different from Facebook's model - it's just less powerful.

Advertisers have always tried to place their ads alongside content that favors their brand identity, sure, but that's about the style of the content. That looks like Burton sponsoring top-tier snowboarders or Chanel advertising in Vogue. Watching Olympic snowboarders might make some people feel adventurous, but Burton's placement is also aspirational and cultural, a way of simply forming an association between the brand and high performance.

The NYT model didn't just put content alongside stories about adventure, it put content alongside stories expected to evoke adventurous sentiments. If a story about a daring Arctic expedition makes you feel relieved to be comfy at home, it could still associate a brand with adventure, but it's outside the sentiment target. That is inferring the emotional state of users, rather than of content. The main difference is that the NYT was making session-level judgements, rather than long-term ones. I find that much less objectionable (even if it's only out of a lack of data), but it's still in the category of mind-state targeting rather than content alignment.


Advertising based on the type of content is nearly as old as advertising. Some TV commercials air during dramas, others during comedies, others during sporting events. If you want to get more specific, you can elect to have your ad air during a certain show.

The only difference between this and what the NYT is doing is the former requires slightly more research on the part of the advertiser, to learn what the show or article is about.


Again, though, my point is that the NYT didn't just sell advertising based on content type. That wouldn't be new. "Project Feels" was an ML initiative to study how readers felt after reacting to stories, and create an ad-targeting tool based on that. It's very specifically about offering advertisers the chance to choose stories based on predicted reader demographics, behaviors, and emotional response, instead of simply targeting stories by category or topic.

To decide whether to air your commercial alongside a drama or a comedy, all you need to do is watch the show (and perhaps collect viewer demographics). To decide which section of the print NYT to advertise in, all you need to do is read it. But Project Feels was only possible by studying the behaviors and emotional responses of readers. It didn't try to alter those emotions in specific ways, so it's not equivalent to Facebook's project, but it's also not the same as content-based targeting.


The NYT is replacing a human reader determining a story's emotional value with an ML program determining a story's emotional value.

You're implying that the NYT's ML software is capable of finding some kind of secret, subliminal emotional traits that wouldn't be detectable to a human reader. I don't find this believable at all.


I'm not. If the NYT had allowed advertisers to handpick stories to appear alongside, I wouldn't consider this substantially different.

What I'm interested in is the switch from choosing to appear alongside a category or keywords (content targeting) to appearing alongside a type of story, with its more specific impact. The impact of ML isn't better-than-human parsing, it's almost certainly worse-than-human. It's just a question of adding story-level targeting which wasn't previously available, along with access to user studies that go beyond demographics and engagement to self-reported emotions.


Surprised the author didn't reference the time when Facebook set out to intentionally manipulate hundreds of thousands of people by playing with their emotions [1]. Facebook made like 200,000 people start to feel sad on purpose and bragged about their abilities in a "scientific paper". That's in quotes because their lawyers say that you give your "informed consent" to be psychologically experimented on when you sign up for Facebook.

Facebook does in fact decide to basically torture people into a depression with its power. Facebook is an evil entity and their intentional emotional torture exerted onto its users far surpasses the kind of thing you're quoting NYT for. Not excusing NYT here but Facebook is next-level-evil when it comes to psychological experimentation on unwilling and non-consenting users.

[1] Lots of articles on this, here's a random one from Google searching: https://www.theatlantic.com/technology/archive/2014/06/every...

(edited for accuracy, millions->thousands)


Well said. Remember how whimsical and far-fetched Monsters Inc. was with the corporations feeding off of childrens' fear..? Shiver



Does it matter if I use the completely block list for Facebook et al on github?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: