Hacker News new | past | comments | ask | show | jobs | submit login
Google Tag Manager, the new anti-adblock weapon (2020) (woolyss.com)
1384 points by thyrox on Feb 21, 2022 | hide | past | favorite | 869 comments



I'm the author, good to see this on HN, raising awareness on the topic

I don't know who made the translation and when it was made, but the original article in french (https://pixeldetracking.com/fr/google-tag-manager-server-sid...) contains more information on recent GTM "improvements"): mainly on how you can easily change JS library names and detailed instructions on how to host your container in other clouds or self-host


> I don't know who made the translation and when it was made

This page was saved with SingleFile (I'm the author of SingleFile). Therefore, I can tell you that this page was produced on Tue Dec 08 2020.


Thank you for making SingleFile, it's been an absolute lifesaver in a project I'm working on. I was having a lot of trouble trying to manually save pages with puppeteer but the singlefile CLI worked perfectly, even with added extensions. (To get extensions to work I had to add --browser-headless=false --browser-args ["--enable-features=UseOzonePlatform", "--ozone-platform=headless", "--disable-extensions-except=/path/to/extension", "--load-extension=/path/to/extension"] )


Thanks for the feedback! It's very timely, I just have an issue that discusses the problem of sideloaded extensions (and profile data).


Uhm, can you pack all those options in a simple "--E" or somesuch...


Gawd I love HN you beautiful bastards.


SingleFile and SingleFileZ are great!

It’s a shame with manifest v3 that it will hamstring the size of pages that can be saved (to 43mb if I remember on the SingleFileLite GitHub page).


Thank you! Actually I did some tests recently with SingleFile Lite and was able to save a page weighting 120MB+ so the 43MB limit seems obsolete. There are still some annoying issues though.


thanks for the info! maybe it's Jerry: https://info.woolyss.com/


Hello pixeldetracking, Yes it is me ;) I translated your excellent page and made an HTML archive with the excellent SingleFile extension. Thank you very much for all. I like to keep a copy of interesting content. https://chromium.woolyss.com/#guides

Regards


Hello pixeldetracking, Excellent article! Bravo. About the translation, yes, it is me! ;)


This kind of data collection abuse is why I think we need more addons like AdNauseam [1]. Unlike uBlock Origin, it's not available from the Chrome web store anymore, which is a good sign that Google hates these types of addons more than they hate simple blockers.

Blocking A/AAAA domains with custom URLs to prevent tracking is almost impossible, so instead let's flood the trackers with useless, incorrect data that's not worth collecting.

[1]: https://addons.mozilla.org/en-US/firefox/addon/adnauseam/


Completely agree. Stuff like uBlock Origin is just online self-defense against hostile megacorporations. Maybe it's time we started going on the offensive by poisoning their data sets with total junk data with negative value. They insist on collecting data despite our wishes? Okay, take it all.


I Like the cut of your jib, and I would like to subscribe to your newsletter.


I worked for a agency a couple of years ago, when, out of the blue, tracked data contained tons of random data instead of the expected UTM parameters. It took us a while to figure out what was happening. It was some kind of obfuscating plugin that was messing up well known tracking parameters.

What I want to say is: stuff like that could actually cause a lot of fun on the other side.


Does anyone know which addon that might've been? Seems like a good addition to adnauseam.


Yup. I've used NoScript for years, and one of the most frequently appearing sites that remain blocked is googletagmanager.

I totally second the sentiment that this is merely minimal defense against hostile 'service providers'.

This avalanche of tracking libraries is now almost as toxic as email spam in its worst-controlled days. Much of the internet is literally unusable, as pages take dozens of seconds to minutes to load - on a CAD-level laptop that can rotate 30MB models with zero lag.

In fact, does anyone have a blacklist of trackers that we can just blackhole at the HOSTS file or router level? Maybe time to setup a pihole?


In my experience the most popular noscript trackers are googletagmanager and facebook, so with just two domains you can get a lot. But e.g. bloomberg uses full first party proxy for facebook pixel with pseudorandom base url, it's difficult to block even by url; I suspect they duplicate the page request to facebook too, but this is unobservable on client side. Hopefully this solution doesn't scale well.


This is my go-to: https://github.com/StevenBlack/hosts

It helps a lot.


Since this extension actively clicks on ads which may trigger payments, how do ad-fraud services classify endpoints running this extension? Could they consider this malware and add the client IP to blacklists?


> Could they consider this malware and add the client IP to blacklists?

Do malware developers consider the countermeasure softwate created to resist them to be malware as well?


If we were to split what malware does into Infection (getting into the system), Avoidance (hiding from system, AV, or attacking AV) and work (sniffing, sending spam, etc..) then the Avoidance would be by far the biggest and most complicated (and most interesting) category.


They absolutely do.


Good. If it is a shopping or some other service that charges money, then they lose business.

If it is some service that you have no choice but to use, but relies on network effects (like Facebook Events), then you can just send a screenshot to the interested party and they Might consider not using a service that is broken for other people.


Sure, and perhaps also the accounts of users running this while logged-in. Have contingency plans if you run this and your, say, GMail account is blocked.


it is precisely why I degoogled my life.

I did not want to live under the constant threat of big G locking me out of my own life anymore.


Anyone still using gmail today for anything other than throwaway purposes is behaving foolishly.


You sound like you are living in a bubble. This is like asserting anyone who owns a car is being foolish.


I lost my gmail account a decade ago. Since then, year after year, I've been watching people suffer the same fate with gmail, youtube, google play, etcetra. There's always someone who won't believe that google can screw you over all of a sudden. There's always someone who will be surprised, always someone who thought it couldn't happen to them...

I don't know what else I can say. It's a shame I haven't been maintaining a list of all incidents I've come across.


What's the jellybean alternative these days?


With a bit of luck, it gets server owners banned from AdMob/MoPub/etc for fraudulent clicks.


I wish, but I haven't stopped receiving ads yet.


Can uBlock do payload inspection? It would be easy to block an upstream json POST that matches a certain structure.


Interesting idea, installed the addon.

I’m using MS Edge BTW, Microsoft doesn’t care about Google’s advertisement revenue, the addon is available in their marketplace.


Microsoft doesn't care because they collect everything through their desktop environment. That's why you need an email to set windows up now.


> they collect everything through their desktop environment

There're many relevant questions during the install. If one actually uses the OS installation wizard GUI instead of skipping it with "next" buttons, Microsoft won't be collecting much.

Another thing, they don't have to because their business model is honest. They're building software, users are paying for that. Microsoft ain't an advertisement company, they have little motivation to track people.

> you need an email to set windows up now

I did clean installation of Windows 10 last week (recycled an old laptop after migrating to a new one), the email was optional.


lol Windows 10 is essentially deprecated, dumbass


Luckily, ms provides throwaway mailboxes at outlook.com.


It's not going to be much of a throwaway once it's associated with every activity you do on your computer and the internet. In fact, it might be one of the most valuable email addresses (to Microsoft) that you ever make.


I am very interested in this, thanks for sharing.

Adding another party into my web browsing is always a tough pill for me to swallow. I am also a noob at reading trust signaling. What are some of the reasons that I should trust this dev and their processes?


You should not trust them. You can download the add-on and inspect it yourself, if you know some JS. Right-clicking yields this URL:

https://addons.cdn.mozilla.net/user-media/addons/585454/adna...

But it seems to include a lot of code, including some uBlock Origin code.

Either way, this kind of sabotage might get you banned on Google. Be mindful of the risks, and have contingency plans.


You should put the same amount of trust in this dev as you should in any other. I myself trust Mozilla's store reviews enough to run the addon, but if you're more conservative with trust, you can inspect the source code and build the addon itself.

The addon comes down to a uBlock Origin fork with different behaviour. I believe most of the addon code is actually the base uBlock code base.

I haven't seen any obvious data exfiltration in my DNS logs, but then again I'm just another random on the internet. If you don't feel comfortable installing something with a privacy impact as broad as an ad blocker, you should definitely trust your instincts.


Will pihole automatically protect against A/AAAA domains if your blocked domain host file lists are updated regularly?


My experience is that Pihole has been getting less effective over time as more and more ads are being run through the same domain that legitimate content is. When I first installed it it killed ads on my Roku, that doesn’t happen anymore.


What apps on your roku? I had to whitelist a Hulu domain cause it froze when trying to load ads during commercials for example, but when I look at the logs it’s blocking a ton of telemetry and phoning home 24/7 by Roku and Alexa devices.

Are you regularly updating your ad blocking filters? When ads start showing up on my phone I know it’s time to go hit the update button.


I replaced my Roku a while ago, and yes I keep my Pihole up to date.


I feel like the reason you initially used a strong word like abuse is to distract from the same behavior the blockers you mention engage in. Spamming Google event services and "flooding" them with garbage is surely considered to be in the abuse category at least if you're not an avid anti-ad proponent.


They simply have to stop shoving ads down my throat, if they do not want me abusing those same ads.


I used to use AdNauseam a while back, until ads started to show suddenly. So I switched to UBlock Origin, and ads stopped to show again.

After I read your comment, I disabled UO and installed AN again. Maybe some update fixed the issues. But it didn't. I'm now back using UO again.


That's cool, but it's only going to save the 1% that knows how to bend the internet to their will. What we need is legislation, like this: https://www.theregister.com/2022/01/31/website_fine_google_f...

That would actually make difference, not only for the HN crowd.


God damn... this is it, this is the end-game. There's no way to fight this unless you customize and maintain blocking scripts for each individual website.

Yes, websites could always have done this, but the REST (CDN-bypassing) requests' cost and the manual maintenance for the telemetry endpoints and storage was an impediment that Google just gives them a drop-in solution for :(

I think Google is happy to eat some of the cost for the "proxy" server given the abundance of data they'll be gobbling up (not just each request's query string and users' IP address but -being a subdomain- all the 1st party cookies as well). I don't have the time or energy to block JavaScript and/or manually inspect each domain's requests to figure out if they use server-side tracking or not.

I honestly don't know if there's any solution to this at all. Maybe using an archive.is-like service that renders the static page (as an image at the extreme), or a Tor-like service and randomizes one's IP address and browser fingerprint.


"I don't have the time or energy to block JavaScript and/or manually inspect each domain's requests to figure out if they use server-side tracking or not."

By default, I don't run JavaScript. I don't see blocking JS as a problem - in fact, it's a blessing as the web is blinding fast without it - and also most of the ads just simply disappear if JS is not running.

On occasions when I need JS (only about 3-5% of sites) it's just a matter of toggling it on and refreshing the page. I've been working this way for at least 15 years - that's when I first realized JS was ruining my web experience.

I'm now so spoilt by the advantages of the non-JS world that I don't think I could ever return. I'm always acutely reminded of the fact whenever I use someone else's machine.


> By default, I don't run JavaScript. I don't see blocking JS as a problem - in fact, it's a blessing as the web is blinding fast without it - and also most of the ads just simply disappear if JS is not running.

Years ago I was on the "people who block JavaScript are crazy" bandwagon, until just loading a single news article online meant waiting for a dozen ads and autoplaying videos to load. I spent more time waiting for things to finish loading than I spent browsing the actual sites, which killed my battery life. I'd get a couple of hours of battery life with JS on, and with it off, I could work all day on a single charge. It was nice.

Ever since then, I've been using NoScript without a problem. I've spent all of maybe 5 minutes, cumulative over the course of several years, clicking a single button to add domains to the whitelist. If whitelisting isn't something you want to do, you can use NoScript's blacklist mode, too.

> I'm now so spoilt by the advantages of the non-JS world that I don't think I could ever return. I'm always acutely reminded of the fact whenever I use someone else's machine.

I relate with this 100%.


> until just loading a single news article online meant waiting for a dozen ads and autoplaying videos to load.

That sounds like you not only didn’t block JS, you also didn’t block ads. Which is a very different argument. I only block 3rd-party JS by default (and that already requires a lot of whitelisting for almost every site that has any interaction) and I don’t have those issues because I also block ads.


There was a period around 2014 - 2016 where even if you used uBlock, ads would still get through. Even now, when I use computers that just have uBlock Origin installed, some ads, and especially autoplaying videos on news sites, still get through.


Tried NoScript for years and it was a pain. Too many of the sites I use need so many domains full of JS. So I think this will vary widely depending on the person and their preferred/needed sites.


It has to be said: there are people who can get by without JavaScript and those who can't. You can almost predict those who can and those who can't by their personality.

If you are heavy user of Google's services, Twitter and Facebook as well as many big news outlets and heavy-duty commercial sites then you're the 'JavaScript' type and stopping scripts is definitely not for you!

If you are like me and don't have any Facebook, Twitter or Google accounts and deliberately avoid large commercial sites like, say, Microsoft then you can happily switch off JavaScript and experience the 'better' web.

You know the type of person you are, so with this fact in mind there's no point me proselytizing the case for disabling JavaScript.


I can relate 100%. In the past I was constantly using Twitter, Gmail, et al. I was using different hacks to bend them to the extent possible to my will. Time changed, my personality changed and the desire and need to use those services disappeared, therefore I naturally stopped using them. When people where talking about this or that service being down, I didn't notice it at all. I was also lucky enough to not rely on them on my $dayjob. I run my mail server, host my website and run my scripts. Old fashoin guy lets say. It works well for me. Moreover, JS-bloat is a red flag to stay away from certain services. Has served me well.


This seems like a broad generalization. JS continues to permeate every industry brought to the web. It's increasingly not optional as employers and governments mandate more and more web services. Doubtful that can be predicted by personality.


"...as employers and governments mandate more and more web services."

It's not compulsory, especially governments. I never deal with government on the web at a personal level. If they expect me to fill in forms I simply say that I do not have the web and would they please send me a paper copy - which they're obliged to do at law - same goes for the census.

If the government expects me to do business with it on the internet then it will have to legislate to make it compulsory AND then provide me with the necessary dedicated hardware for said purpose.

Why would I act this way? Well, for quite some years I was the IT manager for a government department and I know how they work (or I should say don't work).

BTW, as IT manager I never used email within the department (perfunctorily emails sent to my office were received by secretarial staff). If the CEO wanted to send me an important memorandum then he had to have it typed up on paper and personally sign it (and I would reciprocate the same). When in government you quickly realize that atoms on paper and especially a written signature is real guaranteed worth - unlike ephemeral emails that can vanish without trace.

I'm forever amazed at the trust the average person has in these vulnerability-ridden flaky systems.


> If you are heavy user of Google's services, Twitter and Facebook as well as many big news outlets and heavy-duty commercial sites then you're the 'JavaScript' type and stopping scripts is definitely not for you!

I, unfortunately, use some of these services and similar ones, too, and it takes a few seconds to enable JS on them, and then the sites will work indefinitely afterwards.


I use NoScript with Firefox on Android (together with uBlock Origin). After I unblocked the websites I regularly use (and not the ad delivery domains), it doesn't get in the way that much.


Unblocking the sites you use removes the advantage of not being tracked by Google through tag manager though.


I've had Google Tag Manager blocked for years and sites have worked fine without it.


That's probably true. Part of the reason why I still also use an ad-blocker.


> Too many of the sites I use need so many domains full of JS

I hear you, but I wonder if you are being honest with yourself when you use the word need.

At this point, I view Google and Facebook as the equivalent of loan sharks. A loan shark does provide a service, but most people shouldn't use one.


Are you a web developer by any chance?


> Years ago I was on the "people who block JavaScript are crazy" bandwagon, until just loading a single news article online meant waiting for a dozen ads and autoplaying videos to load.

Seems like clear case of "crossing the river to collect water" (as the Swedish saying says)? This is what I use uBlock Origin (with the right blocklists) for and it happens automagically. I did use uMatrix for quite a awhile, but eventually ended up ditching it because uBlock Origin worked so well.


uBlock Origin solves the problem you had too, without breaking multiple sites.


There's another, indirect benefit to blocking JavaScript.

Over time I have noticed a strong correlation between sites which don't work right without JS and low-quality content which I regret having spent time reading.

Most of the time I encounter one of these sites I now just close the tab and move on with a clear conscience.


"Over time I have noticed a strong correlation between sites which don't work right without JS and low-quality content...."

Absolutely true, I can't agree with you more. I've reached the stage where if I land on a site and its main content is blocked if JavaScript is disabled then my conditioned reflex kicks in and I'm off the site within milliseconds.

Rarely is this a problem with sites that I frequent (and I too don't have time to waste reading low quality content).


Any tips for high quality content sites? It truly is hard to find these days


Yeah, read HN!

There are stacks and stacks of them here on HN that are of excellent quality - I use HN as my 'quality' filter (and I reckon I'm not alone).

Moreover, if one doesn't run JS like me then it's dead easy to avoid problematic sites as HN lists them (Twitter, etc. - and it doesn't take long to get to know the main offenders, thus avoid them).

:-)

BTW, I agree with you it is hard to find good sites these days but eventually most really good sites appear here on HN. Do what I do, when you come across them bookmark them.


A pedantic note that follows from this particular thread: HackerNews’s search capabilities are powered by Algolia and require JavaScript to work (turn off all JS and the HN branded Algolia page will not load). The reason I bring this up is that even good websites sometimes lean on free or free-ish services to provide extra functionality (such as calendars, discussion boards, issue tracking, or search) without realizing that such functionality may be a back door to letting JS in and any tracking/privacy-erosion that could follow from it.


Right, HN does use JavaScript for certain functions, search etc. Now, if you read the second paragraph of my first post I've got such cases covered.

OK, here's the scenario: I log on to HN with JavaScript disabled, do all the things I do, read articles, submit posts all without JS. At some point I want to search HN so I hit the 'toggle JS' button on my browser, it then goes from red to green to tell me JS is now active. I then refresh the page and start searching HN. When I've finished I hit the JS toggle and the button goes back to red - JS is now kaput.

I really can't think of anything simpler - JS is off until I really need it and when I do it's immediately available without digging deep down into menus etc.

I'd add HN uses JS as it was originally intended and does so responsibly. I have nothing against JS per se, the problem comes from websites that abuse webpages and thus the user by sending megabytes of JS gumph and so on.

Running without JS and only turning it on when really necessary I reckon is a reasonable compromise.


It's true, there are some decent sites out there which use JS legitimately to add features. And there are some sites which require JS without really needing to, but still have good content and do not have unnecessary annoyances and performance problems.

Lucky for me, I can toggle on JavaScript for them individually and continue with my general policy.


The thing with WWW is links, the web. So https://news.ycombinator.com is a good starter. From there, yes, you could end up on twitter.com for example but it would be worthwhile.


“…you could end up on twitter.com for example but it would be worthwhile.”

Unpopular opinion: I never click on twitter links anymore. It’s almost never worth it.

IMHO, 140/280/N character limits are a way to cheapen discourse. I think there is something to be said for the “density” of text: text that offers very little to think about (less dense) is vacuous but encouraged by a character limit; yet, text that is compressed into a character limit either packs too much info into a short space that requires more discourse to properly get a thought across or elides too much from the text, making it less accurate/meaningful/important. Or worse: people chain posts into long 1/907, 2/907, 3/907… trains that should be blog posts rather than requiring some other application to string the thread together.

Of course the other reason (more central to this discussion) never to click on a twitter link is that JS and an account login is required now to read the posts past a certain point. If that makes me an old man yelling at a cloud, so be it, but aren’t there better ways to handle online public discourse without sacrificing people’s privacy and security?


"Unpopular opinion: I never click on twitter links anymore. It’s almost never worth it."

It's not unpopular with me, I agree with you completely. I was never a Twitter fan but when they forced the use of JS that was the end of it (you'll note I used Twitter as an example in one of my earlier posts).

You're right about sacrificing people’s privacy and security, as I said in another post 'I'm forever amazed at the trust the average person has in these vulnerability-ridden flaky systems'.


Similar here. When I am searching for something and a website wont show it unless I enable JS on that website, then usually it is the case, that after enabling JS to see the content, I realize, that the website's content is worth nothing and that I activated JS for naught, regretting to have spent time on that website.


I used to run NoScript then at some point (maybe switched browsers?) I stopped using it. You've persuaded me to re-enable it.

Also - Firefox on mobile supports NoScript!


No, only FF on Android supports extensions.


Because Apple essentially does not allow Firefox...


Concerning noscript, is this [1] still a thing?

[1] NoScript is harmful and promotes malware - https://news.ycombinator.com/item?id=12624000


Can't find any ads on NoScript.net with uBlock running and uniblue.com seems to have expired. However it is hilarious that the complaint comes from Ad block Plus, their entire business model is build around bypassing EasyList. For a generous fee they make sure that your ads are "acceptable".


What makes you think this comes from ABP? The article linked to is from 2016, they link to a history between NoScript and ABP. The article by ABP is from 2009 (!!). Back in the 2009, ABP was the defacto standard. There was no uBlock. There was NoScript, but no uMatrix yet.

The developer issued an apology and reverted the change, and apart from a Ghostery one (who are also shady) no further controversies are documented at [1]. Perhaps the Wikipedia article is incomplete, given the one linked is from 2016?

[1] https://en.wikipedia.org/wiki/NoScript


Firefox has never been slow for me over the last 15 years because NoScript makes it light years better than Chrome. Conversely, I routinely have the Android assistant lock up on me from JS bloat despite the supposed performance enhancement of AMP pages.


I don't know which web you're viewing that only needs JS for 3-5% of websites


HN totally usable for basic functionality w/o JS.

profootballtalk.com works great if you don't want to vote or comment

macrumors.com great functionality

nitter.net happily takes the place of twitter.com

drudgereport.com works great and I rarely turn on JS when I go to the sites he links to, usually the text on target sites is there if not as pretty as it could be

individual subreddits (e.g. old.reddit.com/r/Portland/ ) are quite good w/o JS. But the "old." is probably important.

I admit that there are lots of sites that don't work, e.g. /r/IdiotsInCars/ doesn't work because reddit uses JS for video. For so many sites the text is there but images and videos aren't. Also need to turn off "page style" for some recalcitrant sites.

In conclusion, contrary to your JS experience, I'd say that I spend over 90% of my time browsing w/o JS and am happy with my experience. Things are lightning fast and I see few or no ads. I don't need an ad blocker since 99% of ads just don't happen w/o JS.


> In conclusion, contrary to your JS experience, I'd say that I spend over 90% of my time browsing w/o JS and am happy with my experience. Things are lightning fast and I see few or no ads. I don't need an ad blocker since 99% of ads just don't happen w/o JS.

Well, you still have lots of tracking stuff loaded probably, unless you got something extra for blocking trackers. A tracking pixels does not need JS. A font loading from CSS does not need JS. Personally I dislike those too, so I would still recommend using a blocker for those.


Well, you still have lots of tracking stuff loaded probably, unless you got something extra for blocking trackers.

Yes I'm sure I have that stuff loaded. But I don't care because it's quite ephemeral:

I exit Firefox multiple times a day, there's really no performance cost to doing that after every group of websites. E.g. if, while reading HN, I look up something on Wikipedia, or I search with Bing or Google, everything goes away together.

In my settings: delete cookies and site data when Firefox is closed

In my settings: clear history when Firefox closes, everything goes except browsing and download history

No suggestions except for bookmarks.

So when I restart Firefox to then browse reddit it starts with a clean slate.

Comcast insisted I purchase a DOCSIS3 modem quite a while ago. Once downloads are at 100 mpbs+, does it really matter if I repeatedly re-download a few items to cache?

The only noticeable downside is when I switch to Safari to view something that needs JS, I then see ads for clothing that my wife and daughters might be interested in. I presume this is due to fallback to tracking via IP address. Of course I always clear history and empty caches in Safari.

Obviously this doesn't work for someone who wants to or needs to keep 100 browser windows open at once, for months at a time. But that's not me. I don't think that way, never have.

Edit: just had to add that sites like Wikipedia are better w/o JS (unless you edit?). I don't see those annoying week-long pleas for money. Do they still do those?


> Obviously this doesn't work for someone who wants to or needs to keep 100 browser windows open at once, for months at a time. But that's not me. I don't think that way, never have.

Caught me. Tab hoarder here : )

> I don't see those annoying week-long pleas for money. Do they still do those?

They still do those. At least I have seen them less than a year ago.


Read my reply to paulryanrogers about whether one's a JavaScript or a non-JavaScript type person.

The 3-5% of sites I'm referring to are ones where I have to enable JS to view them. In by far the vast majority of the sites that I frequent I do not have to enable JS to view them.

Also note my reply to forgotmypw17, one doesn't need JS if one avoids low quality dross.


I will give it another shot. Unfortunately though, this does not solve the server-side GTM issue, right ?

If the 3-5% of the website you use will start tracking via server-side GTM with the site's domain, you will not be able to simply use noscript to disable tracking ?


You're probably right, but then there are many factors involved - take Europe's GDPR, I'd reckon it'd be deemed unlawful under those regs but of course that doesn't help those of us outside Europe.

It remains to be seen how Google's Tag Manager actually works and I'd be surprised if data from your machine is ignored altogether. If your machine says nothing about you then Google won't know who you are - unless you have a fixed IP address and most ordinary users don't. Sure there's browser fingerprinting (but I never bother about this as I use multiple browsers on multiple machines which screws things up a bit).

When I used to worry about this more than I do now, I used to send my modem/router an automatic reboot signal during periods of inactivity, this ensured a regular change of IP address.

OK, so what info can be gotten from your machine if JavaScript is disabled? Some but it's nothing like what happens when JS is active - in fact the difference is quite staggering (ages ago I actually listed the differences on HN).

Presumably you could search for the post but there's an easier way. Use the EFF's test your browser site https://coveryourtracks.eff.org/ and do the test with and without JS. Note specifically the parameters with the 'no JavaScript' message.

Also note the stuff a website can determine about you even when JS is disabled - with this info you can start tackling the problem such as randomizing your browser's user agent, etc.

My aim was never to kill evey bit of tracking, rather it was to render tracking ineffective and I've been very successful at doing that. The fact is I don't get ads let alone targeted ones just by turning off JS and having an ad blocker as backup. The only other precaution I take is to always nuke third-party cookies and to kill all standard cookies when the browser closes.

I'm not too worried about Google's Tag Manager, for even if Google tracks me it still has to deliver the ads and it cannot do so with JS disabled and an ad-blocker in place.

__

Edit: if you want to watch YouTube then Google insists you enable JavaScript. This is bit of a pain but it's easily solved with say the Android app NewPipe (available via F-Droid). NewPipe also has the added advantage of bypassing the ads and having the facility to download clips as well if that's your wont.

Of course, there are similar apps for desktops too.


I have advanced protection on my Google account that unfortunately doesn't let me install apps outside Play Store...

I think I can still load NewPipe through usb debugging but not able to have auto updates


If you've advanced protection running then you're a dyed-in-wool Google user (hard core type) so I wouldn't even try.

I'm the exact opposite. I root my Android machines and remove every trace of Google's crappy gumph, Gmail etc. (I don't even have a current Google account.)

I occasionally use the Google playstore but I log on anonymously with the Aurora Store app (not available on the playstore).

I say occasionally because that's true, instead I use F-Droid or Aurora Droid to get my guaranteed spyware free apps. It's a different world - I'm the antithesis of the happy Google user.

Don't try to load NewPipe, in your case it's just not worth the effort (and Google will notice the fact).


This. I use the no script addon by default, and it’s amazing how many different domains sites try to bring in. Then I hit Twitter, imgurl, quora, etc and I am left with nothing but a blank page with plain text telling me that I need JavaScript to view the site. It makes me wonder what kind of tracking they are pushing.


All of them. If you allow everything and have Ghostery running in "don't block anything but tell me what's there" mode, it's horrifying just how many things get loaded.

You can play with page load sizes in the debugger console with stuff blocked and without too - about half the downloaded material on any major news website is stuff that Ghostery will block. It's quite terrifying.


> and also most of the ads just simply disappear if JS is not running.

since we are talking about the future I'd like to point out that they can always serve ads from the origin domain without javascript.

I mean the anti-adblock battle will evolve until each page we visit is a single image file that we have to OCR to remove ads. then we will need AI, and they will have captchas that will ask which breakfast cereal is the best.

you can stay ahead of the curve but it's always moving forward.


"...they can always serve ads from the origin domain without JavaScript."

But most of them don't. Yes, they can change their model and in time they likely will.

As it stands now, one doesn't have to watch ads on the internet if one doesn't want to - all it takes is a little perseverance and they're gone. If one can't rise to the occasion then one has a high tolerance for ads.

Even YouTube can be viewed without ads with packages such as NewPipe and similar.

You're right about AI, OCR etc. and I think in time it will come to that.

It seems to me people like us will always be ahead because we've the motivation to rid ourselves of ads. It reminds me of the senseless copyright debate - if I can see the image then I can copy it. No amount of hardware protection can stop me substituting a camera for my eyes. What's more, as the fidelity goes up HD, 4k etc. the better the optical transfer will be (less comparative fidelity loss).

That said, the oldest technology - standard TV - is still the hardest to remove ads from. Yes, one can record a program and race though the ads later (which most of us are very adept at doing) but it's still inconvenient.

What I want is a PVR/STB that figures out the ads and bypasses them. Say I want to watch TV from 7 to 11pm (4 hours) and there's a total of one hour of ads and other breaks in that time that I don't want to watch then I want my AI-aware PVR/STB to suggest that I start watching at 8pm instead of 7 as this will allow it to progressively remove ads on-the-fly across the evening.

The person who makes one of these devices will make a fortune. If the industry tries to ban it (as it will) then we resort to a software version and download it into the hardware. Sooner or later it's bound happen and I'll be an early adopter.


> What I want is a PVR/STB that figures out the ads and bypasses them. Say I want to watch TV from 7 to 11pm (4 hours) and there's a total of one hour of ads and other breaks in that time that I don't want to watch then I want my AI-aware PVR/STB to suggest that I start watching at 8pm instead of 7 as this will allow it to progressively remove ads on-the-fly across the evening.

I wonder if something like sponsorblock for youtube (which is a must have) could be done for TV? it's a crowsourced effort and works flawlessly for popular channels.


Good question, I don't know. It's certainly worth thinking about.


How does blocking javascript in this case prevent tracking? It's done via the same cookies the website uses, as I understand it. Do you disable cookies too?


i used to have javascript turned off for a long time, but i've given up. you can't even search hacker news without javascript (for some reason).


Pretending as if you can search hacker news with JS turned on...


There is some truth to this though. It is sometimes hard to find that HN topic, that you remember just a few words of through the aglolia search thing.


Exactly! If something didn't work without JS, I don't use it. There are many alternatives.


Apple’s Private Relay blocks this type of cross site tracking.

Given this tracking is all server side, third party cookies across sites aren’t possible using this mechanism, and private relay cycles through your IP addresses frequently and uses common IPs across multiple users.

Regarding your other point, unless Google execs want to be thrown in jail / sued, they can’t use things like first party cookies for their benefit since that is against their terms of service.


How is private relay different from a vpn? A lot of fingerprinting scripts also can track you despite vpn.


Private Relay uses ingress and egress relays. The ingress proxy does know your IP but not which sites you are visiting and what you are doing. The egress proxy is only connected to the ingress, sees what you visit but does not know who you are. Both proxies are run by different parties.

With a VPN you would have to trust one provider, who sees all of your traffic.


Then is Private Relay equivalent to a two layer tor setup?


From my understanding yes, but with the caveat of being organised by a single entity (apple)


I wonder why Safari is required? I’d be interested in paying for this if it worked with Firefox.


Yeah that would be a useful service that Mozilla could offer and I'd actually pay for.

I don't like their VPN as it's too basic in terms of privacy protection and it's much more versatile to just sign up with Mullvad myself because then I can use it on other stuff than just the browser.


I think in the short-term the strategy is this from the article:

> Or ... block all the IP addresses of Google App Engine, at the risk of blocking many applications. having nothing to do with tracking.

Anyone hosting legitimate apps in the Google ecosystsm is indirectly complicit in this and at least for my personal network, I have no concern with blocking Google App Engine holistically.

Additionally, I think it's important to hurt Google as much as possible for escalating in this way. Widespread blocking of GAE may seem extreme but it's also arguably warranted.



I have no concern with blocking Google App Engine holistically

Unfortunately, it seems that more and more government web sites rely on Google services to function. And there's no replacement for those.


Use two browsers. One where you don't block tracking and can access government and make purchases on shopping sites, and one tracking is blocked and JavaScript is turned off.


How can it be legal for a government to make increasingly core services depend on these amoral, for profit monsters?


The military-industrial complex would like to have a word.


I’m not sure if this is a serious question, but what would this imaginary law say?

The government can only do business with companies who aren’t in it for the money?


How about that government services must be built by the government?


Yes, I feel the same, at least for a lot of things. Certainly, all externally facing websites should be designed and maintained by gov't staff.

From time to time, HN features high quality UK gov't websites. In the last five years, the UK gov't has made dramatic strides on "digital gov't" initiatives that benefit regular citizens. As I understand, most of those sites are built and maintained by gov't employees. This runs counter to the normal, all-prevailing attitude in UK that "any gov't is too much gov't" (or "any gov't that does not directly benefit me...").


The trouble is, they're mostly Microsoft and either Azure or AWS behind the scenes. The UK government as a whole seems to love Microsoft. I just worry it will be out of the frying pan and into the fire...


Brit here. On your last point, there is no such widespread attitude in the UK towards government. We are historically conservative, but not libertarian. Don't forget two of the most famous and loved British institutions are the BBC and the NHS. I'm not saying such attitudes don't exist, because they do, but it's not "all-prevailing" by any stretch.


I think it's a typo/autocorrect and they meant US at the last instance instead of UK.


The Conservatives want to privatise the BBC and the NHS though - abolishing the BBC licensing fee is a recent move, and steps to privatise the NHS have been repeatedly popular among politicians over the last decade.


I would like that law. However, they would have to pay wages and offer working conditions, that actually attract good developers and they would have to stop outsourcing everything. Outsourcing everything is also a problem with otherwise qualified engineers unfortunately. The big picture long term consequences are unpleasant.


You have to draw a line somewhere with that logic, otherwise you'd have governments running their own fabs.

I'm fully in favour of governments doing everything from hosting up ( hosting, design, dev), with as much as possible open source.

For instance the French government fares well on this front, with most government services being developed in-house, and many parts are open source; in emergencies specific services were delegated to third parties ( e.g. vaccine bookings) so it isn't taken to a religious NIH level. However hosting is delegated to commercial entities.


Realistically, Congress could in fact mandate that government website implementations must be transferable between software vendors. That’s both technically feasible and in line with past government requirements for hardware procurement.


The US government isn't shy about adding rules for its contractors. It should be trivial for them to demand (or provide) dedicated IPs for their sites. Then they won't get caught up in the IP address blocking of GCP.


The big tech companies have all built out lobbying capabilities; such a law would end up helping big tech and harming small companies because the big companies would be involved in authoring the law and would be contributing to the sponsors and committee chairs and members to get their favorable language included. And it would all be legal and business as usual.


They don't have to be laws. It's something that Biden can just add into every RFP the US government puts otu.

But no, typically things like that don't hurt small companies.


> but what would this imaginary law say?

IANAL, but how about something like, "Government services offered via WWW must not contact commercial servers and must be fully usable with non-JS browsers."


Aren't browsers shifting to a per-domain cookie jar?

While you can never prevent one specific site from tracking you, this still doesn't (directly) allow your activity on Site A to be linked to activity on Site B, does it?

Of course, fingerprinting combined with IP addresses will ultimately allow something that comes very close to it, so the current state (a few hundred trackers per website, all ending up harmlessly incrementing the adblocker's counter) is better for privacy for power-users, but I'm not sure if this is the big "game over".


Google is pushing to have the browser itself track your interests and share them with whoever asks. The first attempt FloC backfired rather quickly as it was an all around privacy nightmare. The second attempt Topics promises to fix a lot of the problems FloC had but that is not a high bar and Google left itself a lot of room for future changes.


This is what I’m interested in. Article itself did not mention cross site tracking.

Every website having their own tracking subdomain makes third party cookies not work cross site even without browser changes.


They can still cross-track based on IP or any other fingerprint worthy information. I expect this is exactly what they're doing. Doing this all on a central service makes this process much easier unfortunately...


yes, they would need to get another identifier, and that's what is done with players like Facebook.

Sorry another of my articles in french: https://pixeldetracking.com/fr/les-signaux-resilients-de-fac..., but Facebook is making it easy to integrate their "Conversion API (CAPI)" with GTM Server-Side tagging


But that should only help e.g. a web store to track you from the ad you clicked, which seems reasonable.

It should not allow e.g. Facebook to link your activity on a news site to your Facecbook cookie, because while you're on cnn.com, your browser is using the cnn.com-specific cookie jar for everything, including the like button?


The cross site tracking is done by a third party. From reading the docs, the way it works is, publisher sets a unique id, browsers send that unique id to the publishers domain, publisher forwards that (via the tag manager app engine) to the third party.


> Maybe using an archive.is-like service that renders the static page (as an image at the extreme)

A lot of companies are starting to use "browser isolation" which is essentially what you're saying. A proxy runs between the client and the server, but it does more than just direct TLS streams - it actually builds the DOM and executes the JS. The resulting web page is sent to the actual client browser, which might send back things like mouse and touch events to the proxy, which will then update the page.

I think most companies are using this as a malware protection thing, but it does hide the actual client IP address and fingerprint, and I imagine it would make tracking very difficult.

https://en.wikipedia.org/wiki/Browser_isolation


Browser isolation isn't quite that. It's just running a browser that is heavily sandboxed from internal files and networks, or running on another machine so any exploits don't hit your machine.

It's very much like running a browser through Citrix (in particular the remote flavour which is the most common as far as I've seen). But of course any data in the browser itself is still within reach for the malicious code... Which only solves half the problem. Unless you rigidly separate internal browsing from external sites.

But it doesn't run all the JavaScript and then send you a screenshot or anything. The resulting page is still interactive.

Remote browser isolation has the ability to change the landscape of personal computing enormously by the way. Right now we equip all our laptops with at least 16GB (32 for customer care) because some web apps like Salesforce Lightning are such memory hogs.

Considering the importance of the browser in modern computing this model world basically make the PC more like a terminal and require much less resources.

Of course this has already been going on with web based apps and streaming of things like games but this could be the final nail in the coffin of the PC as we know it. Not sure I'm happy with that...


Opera Mobile has been doing this for years and years


The Opera product you are thinking of is Opera Mini. Opera Mobile is a browser running mostly on your device (except for "turbo" which optimized media trough a proxy setup, but did not, afaik, execute any of the javascript).

Opera Mini can be looked at as a browser running in the cloud, sending OBML (Opera Binary Markup Language, if I remember correctly) causing the (very thin) client to draw things on the mobile screen, like text, images, etc without having to transfer, parse, execute, flow and paint every thing on the device.


Yeah, they released countless of rebrands and versions and what not.

The equivalent on desktop would be Browsh (e.g. with terminal + Mosh), but it runs Firefox under the hood. Opera Mini is just akin to a remote browser with the result being send to the client (as a compressed picture like in RDP/VNC, or a proprietary markup language like OBML).


> Maybe using an archive.is-like service that renders the static page (as an image at the extreme), or a Tor-like service and randomizes one's IP address and browser fingerprint.

I'm building a peer-to-peer network of Web Browsers [1] that doesn't trust anything by default, and only allows to render types of content incrementally; while disabling JS completely. Most of the time, you can find out what the content is with heuristics. The crappy occasional web apps that don't work without JS can be rendered temporarily in an isolated sandbox in /tmp anyways.

I think that the only way to get ahead of the adblocking game is to instead of maintaining blocklists, we need to move to a system that has allowlists for content. The user has to be able to decide whether they're expecting a website serving a video, or whether the expectation is to get text content, image content, audio content etc. News websites are the prime example of how "wrong" ads can get. Autoplayed videos, dozens of popups, flashing advertisements and I haven't even had time to read a single paragraph of the article.

And to get ahead of the "if fanboy gets hit by the bus" problem... we need to crowdsource this kind of meta information in a decentralized and distributed manner.

[1] https://github.com/tholian-network/stealth


Called it [1]. It's a cat-and-mouse game and, unfortunately, advertising is just _that_ lucrative. Privacy-minded browsing will help those that care (for now...), but that's an unsustainable option with the current monetization channels available.

If a content publisher cannot monetize you, they will think nothing of blocking you. There will be some public backlash against companies that do so and there will be some sites who will lose money because of it, but the rest of the publishers will simply follow the money while the industry shifts towards more intrusive tactics.

There needs to be a monetization channel that is 1) good for both users AND publishers and 2) pays just as much as current methods. Unfortunately none of the current systems support that.

[1] https://news.ycombinator.com/item?id=9975955


>There needs to be a monetization channel that is 1) good for both users AND publishers and 2) pays just as much as current methods.

I agree, but what party would you like that money to originate from?

Ads work well right now for consumer-to-consumer (e.g. I create a blog and you view it) because there's a rich, third-party that money can flow from (a company running ads --> money to me) without having to charge you, the end-user who is more than likely significantly less well-off than a corporation.

To buck that pattern, you need the money to come from somewhere else. Subscriptions and direct payments are an obvious choice (see: the boom of SaaS over the past few years) but people are already complaining that they have so many subscriptions they lose track of them all, and spend too much money on what used to be a "free" internet.

So, I don't think there's a solution where the money comes from the end-user. However, any time you add in a third party for the money to flow from, they're going to want something in return. And unless you want that cash flowing from the site owner to that third party (...why would you?), they're gonna need to offer something else.

I don't see any solution other than "a third party pays for something users and/or the site can create for free". Is the answer to just find something free other than analytics/usage, or are there other approaches to monetize a site while still making it "free" to access?


Unfortunately I don't see a good solution either. Large direct to consumer business models like SaaS or subscriptions are really only sustainable at scale, and even then it's dicey. In a SaaS model, the big fish win and we lose the democratic nature of the current internet.

Society has driven the perceived price of content so low that the content itself is worth less than the aggregate audience. Really, in what other space does the average consumer set their price expectations at free AND balk at paying $5/mo for unlimited access to a product?

The only thing that seems to come close to moving the needle towards privacy is somehow pushing advertisers into in-market advertising (think early internet-style site banner ads) and out of programmatic/user tracked ads. There is some evidence that these programmatic ads don't really perform as well as they claim but from what I can gather, the data is still unclear.


Simpler protocols (Gemini, Gopher...), outright refusing to use what the modern web has become. I only use HN and a few select sites. You don't need an ad-blocker if there are no ads in the first place.


Using Gemini as an allowlist doesn't seem any better than allowlisting known-good domains for HTTPS sites


HN is a link aggregator for HTTP(s) links. How do you read them?


Not sure about the parent poster, but I am here mostly for the comments, and rarely visit the linked content.


Doesn't exactly this behavior create echo chambers and lead to polarization?


I usually do read the linked content but I agree with GP poster that comments are often more informative.

Yes there is sometimes an echo chamber here, but it's only for limited topics. It very much has a Silicon Valley feel to it, but @dang and I have gone around on this and he assures us that the readership and comments have broad geographic representation.[1] It's a worldwide echo chamber. :)

Fortunately the echo chamber doesn't exist for most submissions. Most of the discussion on HN is on non-polarizing topics.

[1] https://news.ycombinator.com/item?id=26869902


The time of the day is reflective of broad geography, generally.

So some UK or EU specific topics will appear, be commented upon but then disappear later in the day.

It would be interesting to see what kind of topics are commented on from different places.


Which behaviour would that be? The "reading only the comments, not the article"? I don't see how reading creates an echo chamber.

What creates an echo chamber is if all the posts are similar or otherwise in agreement with each other. Those threads make for boring reading and I tend to only scan them for less boring content (yes, that means I read the context surrounding greyed-out comments more than the rest). The threads where people discuss various aspects and experiences is what I come here for.

(full disclosure, I mostly read the comments before even opening the article. I only read the article if there's a high-quality comment thread about some details in the article, or if multiple commenters state that it's a great article. And I tend to upvote an article based on the quality of the comments, not just the article itself).


I dont think so. I'd think Echo chambers are created by lack of diversity in the user base. I think HN has a lot of actual diversity, and its possible to see controversial topics disputed without unceremonial downvoting.


I don’t think the solution here is a technical one. This should just be solved by legislation.

Google Analytics has been recently ruled illegal in multiple European countries. And either this already is illegal under the same laws or it should be made so.


> Google Analytics has been recently ruled illegal in multiple European countries.

Just about everything hosted by a non EU company just got ruled illegal (in the EU that is).


It's very doable to disable google analytics for EU visitors.


Not quite - only everything US-based, since they fall under the purview of the cloud act, which is incompatible with the GDPR (on purpose.. this is an entirely self-inflicted wound by the US).


> Not quite - only everything US-based

No anything with laws similar to the "cloud act", which is the norm rather then the exception, is illegal. It's quite rare for a country to allow companies inside it to say no to there government.


It's not about companies inside it, but companies outside the country. And is it actually the norm? Since clearly even the US didn't have the Cloud Act until 2018. Was the US such a rare case until that recently?


This is interesting to me. The US is basically doing the same thing as Russia and China yet the media never talks about it...


> The US is basically doing the same thing as Russia and China

I don't understand or really get what you're referencing.

The whole issue here is the USA claims global jurisdiction over US companies forcing them to obey the USA legal system even for data located in the EU. On the other hand EU law makes it illegal for anyone globally to turn over data for EU customers without a court order from the EU.


I suspect this might end up as a slightly trickier scenario because when you get down to the details it’s hard at a technical level to make a distinction between a server log file and a tool like analytics which takes those same bits of data and mostly just organises and displays it in an intuitive way with charts and a nice UI.


The ruling against google analytics in France is quite simple: google analytics as used by an unnamed website was not compliant with GDPR, because it exports user data to a country that has privacy laws that are not up to GDPR standards, which is not allowed. This is on the unnamed website and they or compelled to stop this illegal export of user data by either only exporting anonymized statistics or stopping use of google analytics entirely.

Of course this isn’t yet a perfect banning of GA and Google might be able to work around it, but it’s something. And in fact, anonymized statistics would probably be OK (depending on the details of course).


But this actually highlights exactly what I mean. What if I simply stood up a plain old Apache server to host my website but that happened to be hosted in the US. No analytics, just a few HTML files and that’s it.

I’m still in this scenario sending PII of EU citizens in the form of IP addresses to the US which are just written to /var/log/apache

It seems obviously different and yet as that ruling seems to imply it wouldn’t be unless I’m missing something here between first and third party capture or something?


Default configurations of logging on most servers is illegal now under GDPR since it saves IP addresses.


This pops up regularly, but AFAIK it's not correct. The law is much more fine grained than the USA PII concept. IP addresses are only personal data (PD) if you are capable of using them as identification mechanism. If you don't they are not. This also means that something that is not PD for you, can become PD when you give it to someone else. Or that 2 items which are not PD themselves, become PD when you combine them. Or that being hacked turns non-PD into PD.

Even as PD, using IP addresses to maintain a website is fine, even without consent. Using them to track individuals is not fine. Having a log rotation policy and a sane security policy so you can demonstrate when you throw them away is a good idea.

To be short: Install debian, drop nginx on it, then let it log as it wants. This is legal. But don't you dare mine the logs for abusing PD.


Do you have a source? My observation came from multiple lawyers in the context of "to stay on the safe side".


Incorrect. In the "Breyer" ruling[0] the highest European court concluded that dynamic IP addresses are PII (not just personal data, and not just data), as there is an abstract risk that combining IP addresses with other data can lead to identification of a user. The ruling explicitly said that the mere risk of such an identification is enough, not that such an identification has to actually happen.

Subsequent rulings by many courts have found that all IP addresses are PII, for various reasons, such as "static" IP addresses bear the same risk of indirect identification, and there is no reliable way to distinguish between "dynamic" and "static" addresses anyway.

The recent German ruling that Google Fonts violates the GDPR just by transmitting an IP to google (by making the web browser fetch a resource from a google server) hammered home this point, citing the EU ruling again[0].

This is different to e.g. of a streaming provider keeping a history of songs you played. This data is personal data, but it is not personally identifiable data as this history alone cannot be used to identify a person. However, if this history has some kind of identifier attached that links back to account information or an IP address, that identifier would be PII, as this identifier could be used to indirectly identify a person.

[0] https://curia.europa.eu/juris/document/document.jsf;?text=&d...

[1] https://rewis.io/urteile/urteil/lhm-20-01-2022-3-o-1749320/

Die dynamische IP-Adresse stellt für einen Webseitenbetreiber ein personenbezogenes Datum dar, denn der Webseitenbetreiber verfügt abstrakt über rechtliche Mittel, die vernünftigerweise eingesetzt werden könnten, um mithilfe Dritter, und zwar der zuständigen Behörde und des Internetzugangsanbieters, die betreffende Person anhand der gespeicherten IP-Adressen bestimmen zu lassen (BGH, Urteil vom 16.05.2017 - VI ZR 135/13)[2].

Translated, best to my abilities:

The dynamic IP address is to a web site operator a piece of personally identifiable data, because the web site operator abstractly has legal means, which could be reasonably used, with the help of third parties, namely the the responsible authority and the internet service provider, to identify the person in question with the use of the stored IP address (BGH, ruling from the 16th of May 2017, VI ZR 135/13)[2]

[2] The BGH ruling quoted is the "Breyer" ruling again, just at the German national level instead of the EU level. The Bundesgerichtshof (BGH, highest German court of ordinary law) asked the European Court of Justice to settle the question of whether dynamic IP addresses are PII, which the ECJ affirmatively settled in [0].


This is a very interesting legal document, and I'll have to take the time to read it slowly before I can judge it.

It centers around this line:

   ... not PD for you, can become PD when you give it to someone else
and claims that, as this potentiality can always be fulfilled, you should consider it PD. This would invalidate the first part of the post, but is still not enough to make a default deploy of a logging http server illegal because of the 6.1(f) legitimate intrest rule. In fact, things like 21.1(b) might make it obligatory.

Now we are in lawyer 'interesting question' territory which costs a lot of money, and I still don't think you'll need to worry, because you're not violating the spirit of the law. Personally, I'll go on depending on 2.2(c)


It's not illegal to store such information in default logs per se, even without explicit consent, if it would fall into the "legitimate interest" category[0], e.g. you need it to operate the service and prevent abuse, and there is no less intrusive way to e.g. reasonably monitor for and prevent abuse.

However, you cannot share such logs without consent, you still have an obligation to inform users about your legitimate interest assessment and what data you store, and you still have to abide to other rights of users such as the right of users to ask for a copy of the data you store about them.

[0] Art 6.1.f https://gdpr.eu/article-6-how-to-process-personal-data-legal...


Gdpr.eu is not an official EU resource. There is no official guidance saying that IP address in logs falls under "legitimate interest" and every lawyer I asked advised against it "just to be on the safe side".

One actually added: Do you really want to test our government's understanding of "legitimate interest" for your business in court?


>Gdpr.eu is not an official EU resource.

Yes, but I never claimed that they were. The text that I linked is a copy of the official GDPR text (and recitals), not an article they wrote on the topic. I used their website, because I find it more usable as they added cross-references links and recital links. But if you prefer, read the official EU version[0], which is the same in content and in words.

>There is no official guidance saying that IP address in logs falls under "legitimate interest"

I haven't said that. I said storing IPs in logs might be legal, if there is a legitimate interest and/or there is consent.

There are actually two official recitals straight up addressing that topic. Recital 47 states (in part): "[...] The processing of personal data strictly necessary for the purposes of preventing fraud also constitutes a legitimate interest of the data controller concerned. The processing of personal data for direct marketing purposes may be regarded as carried out for a legitimate interest." (This is not meant to be an exhaustive list)

Recital 49 states (in full): "The processing of personal data to the extent strictly necessary and proportionate for the purposes of ensuring network and information security, i.e. the ability of a network or an information system to resist, at a given level of confidence, accidental events or unlawful or malicious actions that compromise the availability, authenticity, integrity and confidentiality of stored or transmitted personal data, and the security of the related services offered by, or accessible via, those networks and systems, by public authorities, by computer emergency response teams (CERTs), computer security incident response teams (CSIRTs), by providers of electronic communications networks and services and by providers of security technologies and services, constitutes a legitimate interest of the data controller concerned. This could, for example, include preventing unauthorised access to electronic communications networks and malicious code distribution and stopping ‘denial of service’ attacks and damage to computer and electronic communication systems."

These recitals were specifically added to address some points that had already been litigated in the past in various European courts.

>and every lawyer I asked advised against it "just to be on the safe side".

Good for your lawyers (that you keep mentioning all across threads). I don't know your lawyers, but they seem overly cautious - even for lawyers - and maybe a little bit under-educated on the subject matter. But they still have a point. You cannot just store access logs containing IP addresses, you have to have a legitimate interest, and be able to articulate this legitimate interest, and see if law makers and courts would consider your "interest" to be "legitimate". Which is easy when it comes to fraud detection and network security/abuse (thanks to the recitals), less easy when it comes to other areas, and pretty easy when it comes to different areas that are clearly against the text or spirit of the GDPR; e.g. nobody will buy an argument of "my legitimate interest is that I want to earn money from tracking and selling user data".

[0] https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CEL...

[1] https://gdpr.eu/Recital-47-Overriding-legitimate-interest or https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CEL...

[2] https://gdpr.eu/Recital-49-Network-and-information-security-... or https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CEL...


When you use laws to ban businesses from other countries, those countries will feel entitled to use laws to ban businesses for your countries as well.

It's how protectionism works and it's generally the consumers who lose.


These laws do not ban businesses, they ban business practices. And consumers often win. E.g. laws to ban the business practice of just dumping toxic waste into rivers because it's cheaper were hugely successful - at least in places were they were enforced. On the other hand, there is a danger of regulatory capture, which has to be considered as well...

The GDPR does not ban Google, and it does not ban analytics. But, according to recent court rulings, it bans the business practice of Google Analytics to collect and transfer data to the US - which isn't considered to be a place with "adequate" privacy laws - and other places without prior user consent. Google could potentially come up with ways to make a Google Analytics that does abide by the law, but so far they choose not to. Maybe the changes that would be required would cut severely into revenues, or even make (free) GA cost-prohibitive, but this is in line with environmental protections killing off certain products/businesses that got too expensive when they had to dispose of their toxic waste properly and in a way that doesn't poison people and the environment.


Comparing tracking with "dumping toxic waste into rivers" is comparing a breeze with a hurricane.

> Google could potentially come up with ways to make a Google Analytics that does abide by the law

I personally know of no way to have legal analytics under GDPR, as advised by multiple lawyers.


>I personally know of no way to have legal analytics under GDPR, as advised by multiple lawyers.

If this is truly the case, then these businesses must consider shifting to ethical business models to stay afloat. If not, then competition will steamroll them.


The article is from 2020, and I don't think I've ever seen a site using this approach yet. It is an egregious attempt to circumvent the Same Origin security policy in browsers that developers and privacy advocates should rightly be angry at, but it doesn't seem to have caught on. That's something to be thankful for.


>I don't think I've ever seen a site using this approach yet.

What have you been looking for? It seems like this would be hard to observe.


your are optimistic, most analytics guys I know are working with clients to transition to GTM server-side tagging


While impractical, I liked the article's suggestion of blocking the proxies. I'm curious what reaction this would have. Ad blocking users get no content and move to alternatives and stop being users, or would the sites cave and realize having users interacting is more important than all of the data collected.


It's a fine suggestion. If it breaks the site, then I'd call that a broken website and move on. Maybe next time someone points me there, they'll have fixed their critical issue for users who block tracking proxies.

I'm okay with not being in the target audience of sites that really want to do this. I've got enough other things to do at less hostile places that my FOMO isn't triggered in the least.


How do you identify tracking proxies though? When everything is going through the same domain you don't even know if data is being sent to Google, it's all a server-side black box.


ublock origin has actually an experimental option for this: https://github.com/gorhill/uBlock/wiki/Static-filter-syntax#...

only issue with blocking the proxies is that you can now decide to host the container on your own infra through docker, and it's documented by Google: https://developers.google.com/tag-platform/tag-manager/serve...

I guess this is very interesting for many people, especially in Europe with the "Google Analytics ban"


By using Cname uncloaking that uBlockOrigin can do on Firefox. It should see that the real domain is Google Tag Manager.


I think the article mentions that Google recommends against using Cname for this, and using A records instead.


> Google recommends against using Cname for this

So use Cname? :D


Sites want the ads to get through, right? So they’re going to do the thing that makes that happen: A records.


The greatest minds of a few generations really should think about not being evil.


I think the solution will be for ad blockers to invest in neural nets to detect the graph of the code flow for known variants of the script. The software that detects plagiarism will be a good start.


That sounds like it's going to be slower than not using an ad blocker at all.


Not if the signatures are uploaded and shared.


Hashmap lookups are O(1)


There's no way to fight this unless ... you pass legislation against it or comparable technologies, preferably at a policy level.


You can fight against it by refusing to use these websites?

If you can't do this, perhaps because a big _majority_ of users don't care enough to support this kind of ecosystem shift, what makes you think a majority of voters would support this? (And if not, why would you want to force your view on them?)

It's like legislating that people should only listen to Good Music and eat Healthy Food, as defined by some people who know better than the unwashed masses?


I rather think it's more like legislating that you can't sell people food adulterated with poisons, and you have to label the ingredients accurately. Oh, and it's like saying that you can't sell lead paint, even though it is a very pretty white.


Even without that legislation, most people would already care about avoiding poisoned food.

So a law specifically forbidding poisons is in line with what the majority already cares about.

(Slightly related: see eg some Chinese people making good money from buying baby formula overseas and shipping it back home in their luggage. China has legislation against poison, but people don't trust the enforcement enough.)


> Even without that legislation, most people would already care about avoiding poisoned food.

There is lots of evidence that people would still use harmful substances when it’s nice and cheap. Then other people would be exposed to it just because it is impossible to know the chemical composition of everything around you. Lots of people care about avoiding things like toxic chemicals and harmful bacteria, the trouble is that they cannot see them.

> So a law specifically forbidding poisons is in line with what the majority already cares about.

So why not do it, then, if it is the right thing and people want it?

In the real world, people are not perfectly informed, and fraudsters are willing to lie. So law and enforcement are absolutely necessary to end harmful practices. See lead paint, but also leaded petrol, asbestos, antibiotics in farm animals, and insecticide chemicals spread willy nilly across the countryside. These things not just disappear on their own because some people don’t like it.

Even on the topic at hand, to be honest. People know that ads and tracking are bad and annoying, even if they do not see clearly the extent of the damage. Some of us know how to avoid most of them. And yet, they keep making more and more money, and are far from disappearing. It is difficult to take your point seriously.


Part of the job of lawmakers is, intriguingly enough, deciding what’s good for voters. This would be among those things. Would voters vote for this specific law? Probably not. But they probably wouldn’t vote out the representatives who wrote it either. And arguably privacy needs to be protected for the good of society.


I'm not sure about this notion of the 'good of society'.

If you believe that the 'good of society' is not what voters want, why bother with democracy at all?

(Slightly besides the point: I actually do agree that people behave like idiots at the ballot booth and don't know what is good for them in this context.

Luckily, people tend to be much more savvy when voting with their wallets or their feet. And as a society we would be well advised to encourage these latter two.

Eg by taking subsidiarity serious, and pushing as much decision making as possible to as local a unit as possible. Don't decide stuff at federal level, when the states can handle it. Don't let the states handle, what the counties can handle. Don't let counties handle, what the municipalities can handle. Don't let municipalities handle what people can do privately on their own.

See https://en.wikipedia.org/wiki/Subsidiarity

By pushing authority down the stack, you make the act of moving between states or even just cities so much more powerful and expressive.)


> Luckily, people tend to be much more savvy when voting with their wallets or their feet. And as a society we would be well advised to encourage these latter two.

The problem with voting with your dollars is that people with more dollars get more votes. The problem with voting with your feet is that only some people can afford to move.

If you want "just let the rich decide", why dress it up in fancy words?


As much as possible, people should decide what to do with their dollars.

> The problem with voting with your dollars is that people with more dollars get more votes.

Eh, the biggest and most successful companies on the planet cater to mass markets. The system seems to work fairly well for average people. (And we all suspect the most important politicians cater to tiny elites.) Also, using your dollars to vote means you lose those dollars. So rich people can vote each dollar only once, just like everyone else.


> As much as possible, people should decide what to do with their dollars.

This sounds very good until it is actually put in practice, when people realise that those who have all the dollars have all the power. Now you have an unaccountable oligarchy.

> Also, using your dollars to vote means you lose those dollars. So rich people can vote each dollar only once, just like everyone else.

That’s hilarious. As if those billionaires were not making the median yearly income in a week.


I’m not saying it’s not what voters want, I’m saying they’re not going to vote for it. There’s a difference.

The average voter has a fairly limited horizon in terms of what they see and understand about what’s good for society. And in a democracy you elect representatives because they’re supposed to have a wider horizon and more in depth knowledge, in part because they’re on average smarter than the average voter and in part because they get to dedicate all their time to that specific job.

This means that lawmakers will sometimes have to do th8ngs the voters don’t understand they want. It’s on them to explain it to the voters. And it’s on the voters to vote them out if they still don’t agree.

As for voting with their wallets, I would have agreed say 20 years ago. But marketing has become so all-encompassing and so much money and effort has been spent making marketing stick, that I don’t think most people can make truly independent decisions anymore about many many things.

And free stuff on the internet is definitely something that most people have trouble dealing rationally with. Just look at all the free trials that hook people into costly year long subscriptions, etc etc. Let alone when it’s free in the sense that the users never pays directly but through things as ads and privacy.

My view of this is very much influenced by my being a European and EU citizen, though. And if anything, the EU is a bit of a technocracy that likes to decide for the “good of society”. And that’s not something everyone will like every time.


Well, I was born in East Germany and grew up there. Later I decided to vote with my feet, and pay my taxes in Singapore instead. Much better value for my tax money here---both lower taxes and better government services.

Btw, I'm not saying people are perfectly rational when voting with their feet or wallet. Just that they are much, much more rational than at the ballot booth.

> Let alone when it’s free in the sense that the users never pays directly but through things as ads and privacy.

Well, can't argue about taste? Perhaps people prefer it that way?

> This means that lawmakers will sometimes have to do th8ngs the voters don’t understand they want. It’s on them to explain it to the voters. And it’s on the voters to vote them out if they still don’t agree.

I am basically agreeing with you: voting is a weak channel to transmit information. Almost no individual vote makes a difference. Neither in aggregate nor to the individual voting.

Voting with your feet or wallet does make an immediate difference to yourself, and has at least a clear marginal impact in aggregate. There are less weird threshold effects than in politics. A dollar more spend on iPhones is a dollar more spend on iPhones; but another vote for candidate A only makes a difference if it makes her have more votes than candidate B.

(And proportional representation only helps partially: in the end it's important which coalitions can form a majority in parliament, whether one party has one seat more or less doesn't make much of a difference usually.)

I'd like to give sortition a try to fill up parliament.


I co-develop an open source firewall for Android, which most of our users use for ad-blocking purposes.

The community has known about server-side collection for quite sometime now. You could run Google Analytics on any of the serverless environments since a year or two ago (I noted this on news.yc a year back [0][1]). Tag Manager server-side is Google throwing its own solution in to the mix.

DNS based content blocking was always DoA, there simply are too many chinks in the armour besides CNAME or HTTPS/SVCB or SRV or ALIAS record cloaking [2]. The worst I've seen reported to me by users is a tracker generating domains names on-the-fly (domain generation algorithms) and A/AAAA records pointing to different IP addresses each time [3].

That said, a firewall can still mitigate this offensive, while network security with just DNS was always going to be what it was: A stop-gap.

This isn't the end-game: I fully expect that IP address blocklists would crop up in no time, and will be painfully maintained by folks pouring their life in to it.

TFA points that Google's reverse-cloaking presumably with IP addresses, but the worse would be if multiple domains shared IP addresses (like in a CDN), reverse-cloaked with Server Name Identification. Even firewalls would have to blanket block IPs... and what if those IPs are shared with other Google front-ends like the AMP project / YouTube / Mail / Docs?

The firewalls would also have trouble with something like Ao1 [4]: If multiple websites were behind multiple IPs, or in the extreme, a single IP.

The firewall is bust, but that's good, now we simply de-Google / de-Cloudflare ourselves, and be luddites like they want us to be.

[0] https://news.ycombinator.com/item?id=26003654

[1] https://news.ycombinator.com/item?id=25169029

[2] https://news.ycombinator.com/item?id=26298339

[3] Ex: https://www.reddit.com/r/uBlockOrigin/comments/srza8x/changi...

[4] https://nitter.net/rethinkdns/status/1448738898998292495


I really don't know much about this space, but do you think server-side tagging could be more or less susceptible to user resistance attacks like what Adnausium[0] does? Can we spam them into futility?

[0] https://adnauseam.io/


Adnauseam's offensive tactics can still confuse these server-side implementations. That said, if Google et al figure a way out to defeat it, pretty sure they'd not be blogging or talking about it, at all, for us to know.


Ah, good point. Thanks for the response


> This isn't the end-game: I fully expect that IP address blocklists would crop up in no time, and will be painfully maintained by folks pouring their life in to it.

Proxy can be hosted on the same server as the site itself. In that case this simply becomes a blocklist of naughty websites. Someone still needs to do the hard work of figuring out which sites are naughty.


IP blocking still seems a thing, even with this new feature - the ads need to be served from _somewhere_. I am using pfblocker-ng on pfsense, which uses giant IP blocklists to filter out all connections to spam and ad-servers. I haven't seen ads in 5 years and there is no need for client-side solutions (e.g. adblocker). The places where ads appear are just whitespace.


The idea is that this will be served from the same IP address that the site that you're trying to visit is.


yes, i updated the french article but not this translation (no idea who did the translation btw), Google has a guide to host the container on your own infra: https://developers.google.com/tag-platform/tag-manager/serve...


Thanks for the explanation - I understood this partly from the article and it is pretty worrying for the future.


There is a hope this can be blocked with adblockers inspecting payload of requests and blocking based on some generic properties that could be always present in Google Tag Manager requests to proxies. Unless this mechanism has some dedicated Chrome-level support that would disallow inspecting or blocking these requests.


I think modifying some fingerprintable apis to give faked/altered results could be enough, given the global fingerprint is a product of all partial fingerprints. Some extensions already implement that, eg. https://github.com/kkapsner/CanvasBlocker/


Maybe using an archive.is-like service

No that has turned to shit (for me anyway). Used to be fine, now presents a captcha when JS off. Okay so I switch from Firefox to Safari (where I leave JS on) and it still presents a captcha. I'd rather use the original site with JS than solve captchas.

That has been my consistent recent experience for a multitude of those.

or a Tor-like service

I've never used Tor, but aren't there a lot of complaints of repetitive captchas when using it?

randomizes one's IP address and browser fingerprint

I haven't followed this closely, but didn't Apple make claims that they would soon have an opt-in service that did something like this?


> didn't Apple make claims that they would soon have an opt-in service that did something like this?

iCloud Private Relay[1]. It’s in beta.

[1]: https://support.apple.com/en-us/HT212614


I think there was never the possibility to "out-tech" tracking solutions in the first place. You simply cannot plug every hole imaginable that will be discovered, and still serve your service on a network.

The only remedy is strict legislation and judicial recourse against companies that do try to cheese it.

Just like you cannot possibly implement real world security and surveillance that makes it completely impossible to commit theft, but you can implement strong enough legal deterrance to make it a really unviable risk/reward scenario for individuals and corporations alike


Just block Google tag manager itself. Gets two birds stoned at the same time.


How would you do that? Isn't it the server that talks to Google Tag Manager, not the browser?


Google tag manager in my experience is a script executed by the browser. Then it installs itself in the page and performs the inner payload of user script insertions. It’s a Trojan horse, really. You can block Google tag manager’s embed scripts. I wasn’t aware of a backend integration but it’s certainly possible.

Regardless, I use a DNS based ad blocker (pihole) and it takes care of all this stuff. I occasionally need to turn it off or whitelist domains (like Google tag manager) for client work, but normally I have it blocked.


> Google tag manager in my experience is a script executed by the browser.

Isn't the whole point of this new change that it runs server-side, using a proxy that you install on the website so it uses the same domain?

> Regardless, I use a DNS based ad blocker

But it's the same domain name isn't it?


A Server-Side GTM container compliments a client-side container, it does not fully replace it.

Some processing happens on the server, but event data must still be sent to the server-side container first. For now, the "standard" deployment of a server-side is that it receives hits directly from the browser, orchestrated by a traditional client-side container. So the client-side script is still there, just less bloated.

The server-side container has built-in facilities for serving up the client-side container script. Meaning that domain-name blocking will not prevent this. DNS-based also has some issues: Server-Side Containers run in App Engine, blocking them basically means blocking anything running on GCP.



Current GTM, configured (via the server UI) to inject tracker X:

gtm javascript loads, pulls down the config, injects tracker X javascript into the browser

new gtm:

gtm javascript loads, pulls down config, streams events to google servers to fan out to tracker X as configured

So blocking gtm.js off tagmanager.google.com / www.googletagmanager.com / the various other domains still blocks all gtm injected tags.

The tl;dr is they're become much closer to segment -- which does the data fanout internally to segment. But they should still be straightforward to block.


This is not how GTM server side works. There is not a single call to Google domains from the client, when GTM server side is set up to its fullest. The config (gtm.js) will be loaded from my subdomain and not googletagmanager.com. Also gtm.js can be renamed.


Per the docs here [1], that is not true. You continue to load gtag.js off the googletagmanager.com domain; subsequent events can flow to a custom domain.

[1] https://developers.google.com/tag-platform/tag-manager/serve...


Couldn't you still recognize the script by its content?


No because the script contents can change from site to site. Maintaining an index for every site would get you closer, but individual sites can trivially tweak things to break fingerprinting as often as they want. Even on every request.


Exactly, this is already done for tracking scripts, since it's commong to use proxies to load tracking scripts.


Not with dynamic obfuscation.


You missed the part where they recommend changing the script's name as well, add in changing a few variable/function names in the script and even matching the hash of the script itself would be useless. On top of them recommending using a sub domain with an A/AAAA record so its first party.


Worst-case you parse the script and block it if the AST is too similar.

There are a million ways to detect and block this sort of thing when you control the client. Yes, it's harder than just blackholing a whole domain, but it's hardly impossible.


yes, french article is updated, but this english translation is quite old here it is: https://www.simoahava.com/analytics/custom-gtm-loader-server...


You missed the same domain part. How are you going to block a request when you don't know the url?


You check the loaded script itself to see if it matches an expected pattern.


Does there need to be a loaded script with a certain fingerprint? What if they are just passing data from the browser to some random endpoint? I'm not sure, just thoughts.


There needs to be a script because the tracking still happens client-side and there will be some logic involved. The only way to avoid being blocked by the browser is to track server-side.


The point is that DNS ad blocking is being worked around with this new system, because it looks like part of the site you're on. Also, that google is encouraging modifying the JS to prevent automated tools from blocking the javascript.


use uMatrix or uBlock and block individual domains

https://github.com/gorhill/uMatrix


Proud uMatrix user here. Sadly, just noticed that the repo is now archived and I don't know if it will be maintained. Could not find any fork either.

I'll miss this extension.


You have the features of uMatrix with uBlock Origin's static rules. You just have to write them by hand instead of the convenient table UI.

https://news.ycombinator.com/item?id=26284124

The only thing that uBO doesn't support is controlling cookie access, so I still use uM for that.


> You just have to write them by hand instead of the convenient table UI.

That’s a pretty big "just", though. Very few sites work without fiddling with rules, having to do manual text entry every time would push me towards not using it.

The UI of uMatrix is generally far superior to the mobile-friendly, simplified one of uBo.


>That’s a pretty big "just", though.

It is, but for me the pros outweigh the cons. In particular, even with uM I often ended up editing the rules by hand because it was easier to copy-paste and turn on and off rules for experimenting, but uM would forcibly resort the rules on save which made that annoying.

>Very few sites work without fiddling with rules,

The only sites I fiddle with the rules of are the ones I visit regularly, which is not many. Over the 1.5 years that I've been using this method, I've only got 75 "web properties" in my list (github.com, github.io and githubusercontent.com count as one "GitHub" web property; so the number of domains is a bit higher). Going by git history, I do have to fiddle with one or more rules once a month on average.

For other sites, either they work well enough with default settings, or I give up and close them, or if I really need to know what they say I use a different browser. For this other browser I never log in to anything, and have it configured to delete all history by default on exit. (I've been pondering making this an X-forwarded browser running on a different source IP, but haven't bothered.)

>The UI of uMatrix is generally far superior to the mobile-friendly, simplified one of uBo.

To be clear, editing the rules does not use the "mobile-friendly, simplified" uBO UI. It refers to the giant text field you see in the uBO "Dashboard", specifically the "My filters" tab.

But yes, it'd be the best of all worlds if uBO gains the table UI as an alternative to the filters textfield. I imagine the problem is that static filters are technically much more powerful than what the uM-style rules do, so it'd require inventing a third kind of rule, which isn't great.


I have almost 7000 rules for a 260kb file ;)


ηMatrix is a fork maintained for Pale Moon: https://gitlab.com/vannilla/ematrix


I liked this a lot but I don't see how someone without a computer science degree will use it successfully..

I think this is why Raymond gave up on it.. I think for the masses his time is better spent on uBlock Origin.


It requires some effort to get oriented, but the granularity of control is fantastic. There is no competition.

Although the dev gave up on it, he's open to someone picking it up (if there are any brave souls on HN)

https://old.reddit.com/r/uBlockOrigin/comments/i240ds/reques...


Just block the GTM js from loading, it'll stop it easily.


The big change they are suggesting is that the gtm code is no longer accessed via a predictable Google domain, rather it is requested through a subdomain of the parent site.



uBlock already blocks stuff like Plausible analytics based on what's in the code, even if it runs on the parent site. Would this be any different?


Block the code that they suggest changing the name, domain, and function signatures of? How?


If the loops, if statements, and block scopes are similar then the graph can be fuzzily identified. They’ve had anti-plagiarism software for years.


Annoyingly that would still require downloading them, which I'd definitely prefer not to. It's bloat that serves me no purpose.


For popular sites a backlist could be formed after the first person downloads it.


Can you point me to some anti-plagiarism software? Because this doesn't sound like it will work at a non-trivial level.


Yup, overwrite its API on the page


"Blocking scripts for each individual website" probably isn't too bad of a burden though. There's enough people who are annoyed by this and few enough sites that you actually visit (how often do you actually visit a brand new website, or one that hasn't been visited by thousands already?) that maintained (donation supported) chrome extensions for this will pop up eventually.


>> God damn... this is it, this is the end-game

I don't understand. I tried to read the article but it doesn't make sense to me. What is the end-game? Can you explain? Not everyone uses google analytics, and even if we do it would only be on the front pages... (hooking into any API has always had the potential to expose session data if you pass it, so what's new here??)


It was clear this was going to happen for more than a decade now. I'm surprised it took them so long to really push for this. I'm just reiterating what I said back then: There's no point in wasting any time and resources into a stupid technical cat and mouse game to fix this. The only sensible way to deal with this stuff is through legislation.


If it takes maintaining blocking scripts for individual websites, I'm pretty sure services will spring up to crowd source it.


Nope. The end-game is adding the data collection into the backend frameworks so the user does not have to execute javascript at all.

But this is pretty close to it. I hope Google and anybody collaborating with them get severely punished.


Couldn't a adblocker block the largest javascript blob loaded by the page? Most likely it's gtm. Also with a bit of machine learning it could recognise the patterns in the js blob, no?


You still pay for the app engine requests. This whole product is just a hash script that configures the proxy for you.


“ I honestly don't know if there's any solution to this at all.”

How about the law? Like GDPR? My data is mine.


I mean, technically there is nothing stopping me from following anybody around, documenting their actions, taking pictures. It's easy... But we have laws that prevent this because we decided together that we do not like this.


Wouldn't a script blocker like NoScript or uMatrix take care of this?


I get privacy concerns and hate for ads, but what about "free" internet? Paywalls are a massive annoyance to me personally, and if ads were legislatively blocked, would I have to pay for each website I visit that previously relied on ads for $? Perhaps we could be making micro-transactions for each website visited via crypto (?)


Solutions for sending micro transactions to websites you visit have existed for over a decade[1], no cryptocurrencies or blockchains required.

[1]: https://en.wikipedia.org/wiki/Flattr


So something like https://yalls.org?


"Endgame" is the way all web analytics was done 20 years ago.


The server-side "analytics" of 20 years ago was for aggregate reports on popular pages, number of users, their browsers and OSs and maybe their geo-location; solely for the use of the site owners to optimize and whatnot.

This abomination Google is proposing is unblockable cross-site tracking of people's activities. That site owners get to see some of that data too is insignificant, their value comes from being able to track people across the web. I'd bet Google would even offer this proxy service "for free" depending on how much data they can hoover from the site.


How does google correlate identifiers between different users?


Browser fingerprinting and IP address plus any unique identifiers if you happened to log in on that website.


How much would you pay per month for custom-per-site tracking blocking as described here?


Nothing. No one should pay for not being tracked.


In principle I agree, and I support having the GDPR in effect globally, so that these server-side data sharing solutions are illegal without opt-in consent.

Unfortunately there’s a reality gap between “GDPR everywhere” and the United States and other countries, and that gap was being filled previously by anti-tracking lists maintained essentially for free out of the goodwill of people’s hearts. Now that Google is - and has been - using server-side proxies, those tracking lists won’t scale without human caretaking. Any human versus the entire web would burn out in a day.

So the choice is either to pay humans to enforce our anti-tracking beliefs against scummy corps, or to donate to politicians that believe in GDPR so they can try to make it illegal, or to refuse to pay anything and accept the status quo of being tracked. We’ve reached the end game of the “pay nothing until it’s fixed, then continue paying nothing” ethos: Google has outplayed us, and website owners can afford to pay to track us. I don’t like this, and neither do you. I think it’s time to pay money to fight back, and you do not think it's appropriate to pay money to fight back.

If you or anyone have a good idea on how zero-cost effort can somehow solve the tracking problem, share that with others in a useful reply to the post somewhere. You don’t have to convince me that such ideas exist: you have to convince others who share your “at no cost to me” beliefs to invest their time and energy in your zero-cost idea. And, whatever else I’m uncertain, I guarantee they’re not going to see such a reply down here in this thread that started with a pricing question.


5-10 bucks. Any higher and I'll be looking at other options like not using the web so much.


up to $6.9 - which would be (roughly) $10 local bucks on my country.


It's based on JS. There's your solution. I disabled JS in the browser for nearly 2 decades and I can still use most of the web (HN included).

You are blind to the solution because you don't want to take responsibility for your own browsing. You and people like you won't change, will whine about how nothing can be done while not being prepared to understand the problem is yourself and that's where the solution lies as well. When google screws you over, remember you chose that (maybe by omission rather than commission, but you chose).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: