Hacker News new | past | comments | ask | show | jobs | submit login

Just block Google tag manager itself. Gets two birds stoned at the same time.



How would you do that? Isn't it the server that talks to Google Tag Manager, not the browser?


Google tag manager in my experience is a script executed by the browser. Then it installs itself in the page and performs the inner payload of user script insertions. It’s a Trojan horse, really. You can block Google tag manager’s embed scripts. I wasn’t aware of a backend integration but it’s certainly possible.

Regardless, I use a DNS based ad blocker (pihole) and it takes care of all this stuff. I occasionally need to turn it off or whitelist domains (like Google tag manager) for client work, but normally I have it blocked.


> Google tag manager in my experience is a script executed by the browser.

Isn't the whole point of this new change that it runs server-side, using a proxy that you install on the website so it uses the same domain?

> Regardless, I use a DNS based ad blocker

But it's the same domain name isn't it?


A Server-Side GTM container compliments a client-side container, it does not fully replace it.

Some processing happens on the server, but event data must still be sent to the server-side container first. For now, the "standard" deployment of a server-side is that it receives hits directly from the browser, orchestrated by a traditional client-side container. So the client-side script is still there, just less bloated.

The server-side container has built-in facilities for serving up the client-side container script. Meaning that domain-name blocking will not prevent this. DNS-based also has some issues: Server-Side Containers run in App Engine, blocking them basically means blocking anything running on GCP.



Current GTM, configured (via the server UI) to inject tracker X:

gtm javascript loads, pulls down the config, injects tracker X javascript into the browser

new gtm:

gtm javascript loads, pulls down config, streams events to google servers to fan out to tracker X as configured

So blocking gtm.js off tagmanager.google.com / www.googletagmanager.com / the various other domains still blocks all gtm injected tags.

The tl;dr is they're become much closer to segment -- which does the data fanout internally to segment. But they should still be straightforward to block.


This is not how GTM server side works. There is not a single call to Google domains from the client, when GTM server side is set up to its fullest. The config (gtm.js) will be loaded from my subdomain and not googletagmanager.com. Also gtm.js can be renamed.


Per the docs here [1], that is not true. You continue to load gtag.js off the googletagmanager.com domain; subsequent events can flow to a custom domain.

[1] https://developers.google.com/tag-platform/tag-manager/serve...


Couldn't you still recognize the script by its content?


No because the script contents can change from site to site. Maintaining an index for every site would get you closer, but individual sites can trivially tweak things to break fingerprinting as often as they want. Even on every request.


Exactly, this is already done for tracking scripts, since it's commong to use proxies to load tracking scripts.


Not with dynamic obfuscation.


You missed the part where they recommend changing the script's name as well, add in changing a few variable/function names in the script and even matching the hash of the script itself would be useless. On top of them recommending using a sub domain with an A/AAAA record so its first party.


Worst-case you parse the script and block it if the AST is too similar.

There are a million ways to detect and block this sort of thing when you control the client. Yes, it's harder than just blackholing a whole domain, but it's hardly impossible.


yes, french article is updated, but this english translation is quite old here it is: https://www.simoahava.com/analytics/custom-gtm-loader-server...


You missed the same domain part. How are you going to block a request when you don't know the url?


You check the loaded script itself to see if it matches an expected pattern.


Does there need to be a loaded script with a certain fingerprint? What if they are just passing data from the browser to some random endpoint? I'm not sure, just thoughts.


There needs to be a script because the tracking still happens client-side and there will be some logic involved. The only way to avoid being blocked by the browser is to track server-side.


The point is that DNS ad blocking is being worked around with this new system, because it looks like part of the site you're on. Also, that google is encouraging modifying the JS to prevent automated tools from blocking the javascript.


use uMatrix or uBlock and block individual domains

https://github.com/gorhill/uMatrix


Proud uMatrix user here. Sadly, just noticed that the repo is now archived and I don't know if it will be maintained. Could not find any fork either.

I'll miss this extension.


You have the features of uMatrix with uBlock Origin's static rules. You just have to write them by hand instead of the convenient table UI.

https://news.ycombinator.com/item?id=26284124

The only thing that uBO doesn't support is controlling cookie access, so I still use uM for that.


> You just have to write them by hand instead of the convenient table UI.

That’s a pretty big "just", though. Very few sites work without fiddling with rules, having to do manual text entry every time would push me towards not using it.

The UI of uMatrix is generally far superior to the mobile-friendly, simplified one of uBo.


>That’s a pretty big "just", though.

It is, but for me the pros outweigh the cons. In particular, even with uM I often ended up editing the rules by hand because it was easier to copy-paste and turn on and off rules for experimenting, but uM would forcibly resort the rules on save which made that annoying.

>Very few sites work without fiddling with rules,

The only sites I fiddle with the rules of are the ones I visit regularly, which is not many. Over the 1.5 years that I've been using this method, I've only got 75 "web properties" in my list (github.com, github.io and githubusercontent.com count as one "GitHub" web property; so the number of domains is a bit higher). Going by git history, I do have to fiddle with one or more rules once a month on average.

For other sites, either they work well enough with default settings, or I give up and close them, or if I really need to know what they say I use a different browser. For this other browser I never log in to anything, and have it configured to delete all history by default on exit. (I've been pondering making this an X-forwarded browser running on a different source IP, but haven't bothered.)

>The UI of uMatrix is generally far superior to the mobile-friendly, simplified one of uBo.

To be clear, editing the rules does not use the "mobile-friendly, simplified" uBO UI. It refers to the giant text field you see in the uBO "Dashboard", specifically the "My filters" tab.

But yes, it'd be the best of all worlds if uBO gains the table UI as an alternative to the filters textfield. I imagine the problem is that static filters are technically much more powerful than what the uM-style rules do, so it'd require inventing a third kind of rule, which isn't great.


I have almost 7000 rules for a 260kb file ;)


ηMatrix is a fork maintained for Pale Moon: https://gitlab.com/vannilla/ematrix


I liked this a lot but I don't see how someone without a computer science degree will use it successfully..

I think this is why Raymond gave up on it.. I think for the masses his time is better spent on uBlock Origin.


It requires some effort to get oriented, but the granularity of control is fantastic. There is no competition.

Although the dev gave up on it, he's open to someone picking it up (if there are any brave souls on HN)

https://old.reddit.com/r/uBlockOrigin/comments/i240ds/reques...


Just block the GTM js from loading, it'll stop it easily.


The big change they are suggesting is that the gtm code is no longer accessed via a predictable Google domain, rather it is requested through a subdomain of the parent site.



uBlock already blocks stuff like Plausible analytics based on what's in the code, even if it runs on the parent site. Would this be any different?


Block the code that they suggest changing the name, domain, and function signatures of? How?


If the loops, if statements, and block scopes are similar then the graph can be fuzzily identified. They’ve had anti-plagiarism software for years.


Annoyingly that would still require downloading them, which I'd definitely prefer not to. It's bloat that serves me no purpose.


For popular sites a backlist could be formed after the first person downloads it.


Can you point me to some anti-plagiarism software? Because this doesn't sound like it will work at a non-trivial level.


Yup, overwrite its API on the page




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: