Hacker News new | past | comments | ask | show | jobs | submit login
AdFlush (acm.org)
276 points by grac3 4 months ago | hide | past | favorite | 108 comments



What's fascinating here is AdFlush is a classical feature engineering approach: define a bunch of features on the data manually, and then use ML to figure out the most useful / impactful ones. This is not the "throw terabytes of data and see what happens" approach we see with LLMs. It's a bit funny to even point this out because I don't recall the last time a feature-engineered ML project made it to the HN front page.

Features can be brittle, but they are understandable. The paper's appendix [1] lists the 27 features that will likely make a request/resource "ad-related". These include interesting ones like JS AST depth, average JS identifier length, the "bracket to dot notations ration in JS", and a number of graph measures for the graph of scripts.

And contrary to what comments in this thread are saying, they do compare against a blocklist-based adblocker: uBlock Origin. That's in section 5.5. They say they outperform uBlock Origin. But even they say they don't reduce overall page time bc their algorithm is expensive.

[1]: https://dl.acm.org/doi/pdf/10.1145/3589334.3645698


More specifically, page load time was 2.7 seconds without adblocker, decreased to 2.1 with uBlock Origin, but increased by 250% to 6.6 seconds with AdFlush, or increased to 3.4 seconds with AdFlush retaining prior predictions.

The superior score was an F1 of 0.86 vs 0.84 for AdFlush vs uBlock Origin, and it's not clear to me that this is a statistically significant difference. They do not claim it is.


That seems to argue for a first pass with a blocklist to filter out the well-known ad providers, and then possibly a followup step with the ML to catch things that are trying harder to slip by? But the extensions would have to cooperate to make that possible.


Thanks for extracting the details. It doesn't seem like they'll be competitive with blocklist-based approaches like uBlock Origin, because their features are fundamentally expensive to compute - parsing JS and such, not just matching URLs against a list of regexes.


Seems like it could work in the background to build up new rules for uBlockOrigin to deploy


I like the strategy of using flags to say "look into this suspicious part of the code" over a hardcoded block list. And also block shitty JS via "JS AST depth, average JS identifier length" etc even if it's not an ad but just bad code.

For Brave browser users, you can see what hardcoded lists you're using at brave://adblock .

As for the whole cat and mouse game, how to detect an "ad" if it's served with the content fully sever-side? Now _that_ needs some serious ML to decipher.


> how to detect an "ad" if it's served with the content fully sever-side? Now _that_ needs some serious ML to decipher.

This has been my red line on where I will allow ads vs blocking them. If a site is hosting their own ads, that's acceptable to me. If they are using an ad provider, that is not. The newspaper example is my go to. If you wanted your ad in a paper, you called the paper and took out an ad. Today's equivalent would be every time you opened the paper, a slight delay while it randomly chose the highest bids for the ad space while potentially also inserting something that would slowly eat your hands. That's a nope.

You are obviously in the camp that feels entitled to be able to read anything at anytime without allowing for a website to earn money by wanting to block all ads regardless of their origin.


> You are obviously in the camp that feels entitled...

Not at all. I use Brave and "shield down" websites that I like and generally keep their ad situation under control (incl. 3rd party). But your point of hosting vs 3rd party is a good one and especially because often one 3rd party connects to another.

Likewise, I "block" annoying parts of websites like Yahoo Fantasy Football's enormous top nav that's not even an ad.


AdFlush (F1 Score: 0.98) seems to do better than some other adblockers: AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84), but it begs the question: why not compare to the most popular adblockers: uBlock Origin, Adblock Plus etc.

I think the authors want to compare apples with apples, so they only compare their algorithm to other adblockers that use algorithms, as opposed to those which use crowdsourced lists. The paper somewhat acknowledges this:

> However, manual maintenance of these filter lists requires significant human effort

Seems like one of those tasks where crowdsourcing scales so nicely (only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others) that it makes an algorithmic approach unnecessary.


The filter based adblockers are at risk though, with Google's new extension thingy that - at least a few years ago, I haven't heard from it since - limited the amount of rules. If there's a non-rule based system that is 98% effective then that would circumvent the arbitrary rule limits that Google set.


My understanding is that under manifest v3[1] only a list of rules is allowed. An algorithmic ad blocker wouldn't be able to work at all.

[1] https://arstechnica.com/gadgets/2023/11/google-chrome-will-l...


This is true. Extensions currently (manifest v2) are able to evaluate net requests dynamically, and are able to modify requests according to a dynamic ruleset that the extension can retrieve from some filter list published on the internet.

Under manifest v3, extensions are not able to dynamically inspect requests, instead, they may only apply rules to net requests. Even worse, there is a limitation of only 5000 rules per extension!! [1]

Even WORSE worse, under Chrome's manifest v3 rules, the extension cannot load any external code! Meaning that blocklists must be packaged with the extension. [2] Now, one might consider the reading of that link to no affect block lists, it's not a "library" and it's not "code" so long as it's just a list of textual rules.... however, google considers the following to be a violation: "Building an interpreter to run complex commands fetched from a remote source, even if those commands are fetched as data". [3]

Sneaky sneaky. An extension update (and hence new app store submission) is required to update filter lists.

In other words, dynamic net requests are banned, and remotely-updated blocklists are banned as well.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...

[2] https://developer.chrome.com/docs/extensions/develop/migrate...

[3] https://developer.chrome.com/docs/webstore/program-policies/...


Chrome allows at least 30000 static rules + 30000 dynamic rules[1].

[1] https://developer.chrome.com/docs/extensions/reference/api/d...


That's not enough. Just uBlock Origin's default list "uBlock filters – Ads" already accounts for over 38,000 rules. EasyList is over 87,000!


10x more https://blog.chromium.org/2024/05/manifest-v2-phase-out-begi...

"Based on input from the extension community, we also increased the number of rulesets for declarativeNetRequest, allowing extensions to bundle up to 330,000 static rules and dynamically add a further 30,000."


If Manifest v3 is really this bad then it's probably still possible to build adblockers by DLL hooking the browser. It should also not affect browsers with built-in adblocking like Brave and Vivaldi.


> it's probably still possible to build adblockers by DLL hooking the browser.

I like this. or possibly the COM API. but I'm not a Windows expert.


How complex is to revert changes to manifest to bring supporting v2 back to Chromium? Or is it intentionally made super complex by Google?


Microsoft decided it was prohibitive for them. So probably overly difficult.


I would say it just works for them. Considering they show ads in the Windows Start menu now.


If Google's goal is to thwart adblockers by creating limitations on what browser extensions can do, then creating a browser extension that blocks ads within the current set of limitations is a temporary solution at best.


Google doesn't control the browser, user does.


Google controls the APIs that extension writers can use. They are currently using that control to impose limits on what adblocker extensions can do. [1][2]

You could download the Chromium source and patch it to change the extensions APIs (or better, just use Firefox), but the majority of users won't do this, and extension writers aren't going to make a version for a patched Chromium browser unless it has significant market share and support.

[1] https://nordvpn.com/blog/manifest-v3-ad-blockers/

[2] https://www.eff.org/deeplinks/2021/12/chrome-users-beware-ma...


You could always provide an extension that loads itself as a .dll/.so. I don't see much difference in friction between adding an extension through google's website vs. download setup.exe from somewhere. Of course like you say, using less user-hostile software is preferable.


Such extensions would be trivially easy for Google to break with Chrome updates. You also cannot distribute an extension like that through any of the usual extension stores.

Better to just use a browser that actually respects its users.


That might work for highly tech savvy people, but that's a very small minority of users. Google will still make ad blocking near-impossible for 99.99% of its users.


Firefox has 2.9%. Safari has 18.12%. Everything else is Chrome or reskinned Chrome, with Chrome itself being 65.3%.

Unless you’re running that 20%, Google controls it, and they basically write the standards anymore.


Oh, of course if you run Google-written software without modifications, you're not really controlling it. So if you want to control it, either go inside and tinker with the code, or - easier? - switch to a non-Google browser.

I thought this is rather obvious, at least for those worried about experience. Do you think all those who realize they're suffering from ads don't think about using non-Chromium browser?


I honestly don’t think they think about a nonchromium browser, and if they do think of it, they reject it for unfounded reasons. If they did use a nonchromium browser, Firefox would have a larger market share.


And if addblocking doesn't work on Chrome, Firefox usage will go up.


I guess that's why uBO Lite exists :) I started using it a couple of months ago instead of Ublock Origin, and still haven't seen any ads since.

https://github.com/uBlockOrigin/uBOL-home


I think eventually there is nothing that can stop certain adds on Chrome once specific API's are removed, even using manifest 3. Maybe someone could chime in on this as its really confusing now since Google keeps pushing back the date to remove manifest 2. (This might be outdated info)


We'll create a shim to render the page in the background and use AI to remove ads and then serve the result to the user, at the least. Fuck ads and malvertising


Yes and: There will be a tipping point where it'll be easier to allow the content rather than blocking the garbage. Dynamic screen scrapping, more or less.


Yeah, it generally does feel like a "Catch me if you can" situation. I'm sure that there will be different ad-blockers once those APIs are removed, as there seems to be a very strong desire from some people not to see ads.

I hope we'll not end up in a DRM-like system where ads are somehow really baked in and content stops working for lay-people if they try to circumvent ads.


And that will be the day Chrome dies.


They day Google starts blocking ad blocking users is the day the exodus starts from Google services.


I think you're overestimating the number of people who 1) care and 2) use adblocking extensions or any extension for that matter.

Google knows what will likely happen, and pays people lots of money to know.


Without commenting on Google[1], I think this sort of thing is true in the short term but less true in the long term. I expect that, were Chrome to ban ad blockers, technical folks will start to teach non-technical folks in their orbit how to e.g. install Firefox to regain ad-blocking capability. I think it would take some number of years but there would be a pushback in the medium- to long-term.

1. Googler, opinion solely my own.



This is ironically how Chrome got its big push into the mainstream. Would be great if that’s how it got pushed out. But the world of influential techies, especially amongst the younger, seems to have gotten smaller. Perhaps I’m wrong


They'd massively alienate a large and motivated subset userbase with the ability to build viable alternatives to Google products or at least build more active means to cirvumvent their platform restrictions.


I think you are unfortunately correct about this.

I am consistently blown away when I inadvertently experience the Internet without ad-blocking. It’s absolute garbage.

I am sad that people are either OK with this or don’t care. For many they don’t know any better, and asking many of those same groups to install and manage plugins is a fraught request.


32.8% of global users use an ad blocker. (33% of Americans.) [1]

Chrome's market share is about 65% [2]. If their recent manifest changes eventually break ad blocking (which seems to be the goal), it'll lose a bunch of market share (I guess they're optimizing for short-term profit).

[1] https://backlinko.com/ad-blockers-users [2] https://gs.statcounter.com/browser-market-share


Why do you think everybody switched from IE to Chrome? Because their tech friends told them to or did it for them.

The day Chrome can't sufficiently block ads anymore is the day Chrome dies.


Do you remember IE exodus to Firefox pre-2010? Yeah Google better watch its hyperback.


They learned from Microsoft's mistake and most browsers run off the Chromium while they have Firefox by the balls with their default search engine deal. Not to mention Firefox is hellbent on snatching defeat from the jaws of victory.


I don't know what you mean. They are already blocking adblock users on YouTube and there is certainly no exodus happening there. A few people complain about it and get a handful of upvotes on social media from their friends, but it hasn't even come close to rising to "backlash" status.


Are they? I block ads on YouTube and I’m still allowed watch videos.

I suspect they have silently stopped blocking ad blockers.

I remember there was a lot of reports about this being the case, but there is no way I am not blocking Google.


I suspect that such a move would draw significant scrutiny from regulators, potentially far outweighing any impacts from users switching browsers on their own.


Real easy problem to solve by just switching back to Firefox


The first thing you see when you open Firefox is an ad for Amazon and Expedia.


I don't. Are you talking about 'sponsored shortcuts'? You can turn those off in the settings. It's on the first page you see when you hit the settings button in the top right


Isn't this the case for a bloom filter (vacuum maybe)? You can have very few rules.


> only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others

Is it that easy? Sounds very abusable


Yes, and some list maintainers accept money to add or remove you from the list (officially, or officiously through a secondary maintainer, depending on the list), but otherwise it's no different than getting a domain marked as malware or phishing (with a few paid editors on Phishtank or VirusTotal).

It's easier to get a domain added than removed. and for the "corruption"/"rackeetering" part, it's a "win-win" for the adblockers and the list maintainers.

Adblockers also often pay browsers to be integrated by default (AdGuard, Adblock Plus, etc), and then they negociate with publishers to whitelist some domains (not necessarily the most obvious, can just be analytics).

"We offer your domain to be unblocked on xx millions of devices by default, this will create you a uplift of revenue of +yy%"


Which lists do this? Do any of them ship with uBlock Origin?


Humans are really the primary attack vectors for any security system.


yes, one of my clients was hit by this and i was tasked with solving the situation.

i had to create a ticket in a repo explaining why blocking a whole domain instead of a single subdomain was actually pretty bad. they approved it and reverted the change.

finding where exactly i had to open the ticket and what to write was a “down the rabbit hole” experience.


Domains are cheap, don't serve content on an ad domain maybe?

Sounds like perhaps your task was to ensure a company's ads got through an adblocker?


my task was to rectify an issue in one of these crowd sourced lists of ad servers.

they were blocking a whole domain instead of blocking the ad-serving subdomain.

the issue was rectified, the main domain was replaced by the ad-serving subdomain.


Still, as pbhjpbhj suggested, if I were publishing both content and ads, I would consider publishing the ads on a different domain (not just a subdomain) to reduce technical issues. Domains with ugly names are very cheap.


of course, and this is a valid proposal. but that was outside the remit.


You could be right but you are definitely jumping to a conclusion here.

The default lists used by uBlock for example include things like error tracking telemetry, Sentry for example.

I can see why people want to block that stuff (privacy) but it’s not exactly an “ad”


Yes, but the effects of that abuse are observable and easily fixable. If suddenly a whole site goes offline for a bunch of people a change like that is likely to get reversed very quickly.


there is an entire section in the paper sub-titled: Comparison with uBlock Origin..


practical solutions don't get you published


"Practical solutions" also leave you vulnerable to cat and mouse games against sites that block or bypass adblockers (even with ublock origin). The end game is to have heuristic/AI adblocking which would directly hook into browser rendering so that it becomes undetectable. Obviously leading browsers do not support this for extensions, but forking Chromium wouldn't be so hard.


"doing thing X work and everyone uses it, so bad actors invest time against things X. While thing Y isn't used by anyone so bad actors aren't spending time to work around it, q.e.d. we prove thing Y is better".

i don't really buy your argument


The argument is that Y is more robust.


Without comparison to the accuracy of crowed sourced blocklists it's not that valuable. Maybe there is a group of hopelessly overworked blocklist maintainers/contributors, that I'm not aware of. If so, their cries for help don't seem to make the HN front page. From a user perspective, blocking banner ads feels like a basically solved problem. I think the real pain point here is that for large chunks of the web, there is no distinction between ads and content.


There will never be a solution to native ads. It's part of the content you choose to consume, that someone produced.

The only way to avoid native ads is to stop consuming content that relies on ads.


Stuff like sponsor block works pretty well? If the native ad is seperable from the rest of it you can just skip ahead, and most of those things are still a sign posted sponsor break for now. I can imagine extensions to do something similar in articles by removing affiliate links, etc.


I think it depends on what solution space you are willing to explore. There is the possibility for regulatory action that restricts native ads. It's seems plausible that a flood of AI content tanks the prices for native ads, so some might pivot to original content + regular ads, which might also become more profitable if regulatory action weakens the oligopolies of that space. Aside from high level market shifts and regulatory action, there is of course also the possibility of technical solutions that can help you to avoid native ads.


That really depends on what you mean by "native ads"; if you mean "blog posts that appear legitimate but push a product" then maybe not (although I wouldn't totally rule it out with LLMs), but if you just mean that the ads are inline I have to disagree since ex. SponsorBlock already exists.


In some jurisdictions advertising has to be named as such, there it will be at least theoretically possible to create filters if the platform is compliant.


or have LLMs recreate the content without the native ad


That is nonsense, if we know about 10 exact brands by name, then we can block their mentioning anywhere


> We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84). Additionally, AdFlush significantly reduces computational overhead, requiring 56% less CPU and 80% less memory than AdGraph. We also assessed AdFlush's robustness against adversarial manipulations, demonstrating superior resilience with F1 scores ranging from 0.89 to 0.98

Neat results, I wonder how it compares to uBO or the different blacklists. I assume it self-update with newer techniques and can detect certain patterns?


You can find the comparison to uBO under 5.5


The future is here.

If I recall, in Permutation City there's some part where somebody deals with spam with AI. The user tries to use a simulation to listen to potential spam to filter it, while the spam tries to figure out whether a real person is listening to it and only tries to spam when a real person is there.

Or something along those lines, it's been a long time since I read it.


Blocking image ads seems like a relatively well-solved problem. I mean, speaking as someone who can't stand ads, I don't see very many of them anymore when I'm on desktop.

The harder, more pernicious type of ads are the modals that pop up when your cursor moves toward the back button, or when you scroll down a certain distance on the page. "Wait! Before you go, take a moment to give us your email address!"

Those can be blocked, but by the time you've seen them, they've already done all the damage they can do—which is to say, they've annoyed you.

I wish somebody could come up with a way to detect and stop them. I spent an afternoon trying to come up with reusable techniques to detect these popups, but there are just too many possibilities.


Always a joy to see efforts in the ongoing battle against advertisements.

There are few things I feel radical about, and Ads are one of them. I believe they are a drain in several ways:

They waste computational resources and electricity on both ends. They compromise the visual design and layout of webpages. They distract and take mental energy away from the user. They make the internet (and anywhere ads exist) more "ugly" and less aesthetically pleasing - which negatively impacts mental health. They often sell low-quality services/products or outright scams, which harms those least educated and poorest individuals.

Death to advertisement! On billboards! On television! On the internet!

Ads are a parasite on the human mind that need to go away, forever.


Ultimately it's about where we draw the line for hacking other people's brains.

It's a spectrum: Some level is an unavoidable part of communication ("I like dogs" forces you to think of dogs) some more is considered normal and traditional manipulation ("My food smells nice, that makes you hungry, wanna buy?") and then it goes on into grey-areas, scams, and eventually to potential extremes like "this image induces nausea" or "this sound knocks you out".


They are a scourge and a tell-tale sign that we've grown far beyond excess and into absurd territory where more effort is spent on bending our minds to consume a thing that it took to make the thing in the first place.


Careful, apparently not wanting your mind polluted with psychological manipulation makes you a filthy communist..


Death to small media companies! You should have gotten some VC money if you wanted to make products for people, you poor pieces of shit.


I use a combination of UBO, PiHole and AdGuard on my mobile devices. Can't say I've seen an ad in the last year. Is this trying to solve an existing problem or speculating on where things could go in future?


I’m curious why you’re using 3 separate methods. Do you miss things with just one? AFAIK all 3 use similar block lists and are configurable.

I’m building a pi-hole type solution for myself and essentially want all the filtering and blocking to happen at my firewall and not on my client (phone, laptop, tablet).


I think pi-hole (Adguard home) is useful dns level ad blocker which can be used on network/router level. But it is limited, UBO provides you more flexibility to block cosmetics and certain ads that cannot be done via dns. There will be overlap of course but it is worth it. I agree that adguard here seems redundant and UBO itself recommend against using another ad blocker to avoid interference and websites adblock discovery.

However you might end up using

1. pi-hole on router

2. Adguard as device level DNS

3. UBO on Firefox (android only)

It is possible but not recommended and wasteful. 1/2 and 3 is enough.


AdGuard is for things I take off the home network, for example when I'm at work. It's true I could use AdGuard for both scenarios but I do like the additional visibility and configurability Pi-Hole provides.


Try AdGuardHome. https://github.com/AdguardTeam/AdGuardHome

I basically have all my devices use it when I am on my network, and when I am off my network, my Wireguard connection (or Tailscale depending...) uses my home DNS server.


uBlock only works in web browsers. It doesn't work in phone apps, smart TVs, anything integrated into the OS, etc.

That's why I use uBlock and PiHole, which I deem is enough.


Looks like this is the associated repo on GitHub: https://github.com/SKKU-SecLab/AdFlush


....and of course only a chrome plugin is available.


So AdFlush beats uBlock Origin with a marginal detection rate advantage of 0.86 vs 0.84, at the cost of significant performance overhead: median 2.7s load time (no ad block); 2.2s (uBO); 6.6s (AdFlush clean); 3.4s (AdFlush cached).

I'd like to see a tandem uBO+AdFlush extension that just enables uBO by default, with a "I still see ADs!" button in the extension UI that refreshes with AdFlush enabled and auto-submits any missed ads to a new FlushList filter list.


I didn't realize this was an active area of research, love this.


So, this begs the question when we'll see ML put in place to avoid AdBlocker detection. Or ads as we know them just disappear from the web and are replaced with other kinds of ML-enabled ads. I imagine deep-fake models used for interchangeable product placement in videos or pictures or so.


How does this compare to list based solutions? An overblocking/underblocking comparison would be great


Nice! I’d love to know if AI-Ad / tracking / telemetry / etc blocking could be improved for MITM network layer filtering not just the browser.


Oh boy, that didn't take long. Just last year I made Butter https://butter.sonnet.io as an excuse to talk about this:

> This project is a half-serious, half-assed attempt to demonstrate that in the next few years the process of blocking this type of content could be almost entirely automated. Yes, it would be wasteful from a computational and human potential perspective, and otherwise completely unnecessary, but hey, more money would change hands!



The instructions are on their GitHub page

https://github.com/SKKU-SecLab/AdFlush/tree/main?tab=readme-...

But since the first webpage I tried still had huge ads, I turned uBlock back on ;)


How realtime is this? Or well enough to not be noticeable while browsing


I'd be okay with a hybrid approach: lists for real-time blocking and machine learning for passive analysis to augment the lists over time.


This can be a Copilot+PC's killer feature :-)


> We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84).

... Has anyone even heard of these ad blockers before?


These are all academic research projects.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: