This is an interesting reading. Although there are more tracking mechanisms than pixels. Surely you can configure your email client to not to load remote content automatically, but most of the clients will still leak information in various html/css elements.
A while ago, I used https://www.emailprivacytester.com/ to test several famous iOS email clients, and most of them more or less leaked _something_, even without loading remote content. In the end, I found Fastmail and Apple's built-in iOS mail client to be the top-notch in terms of privacy (Fastmail leaked nothing but only their server side DNS server via DNS prefetch[1][2], which has nothing to do with client. Apple is slightly worse, but still far better than any other email clients like Outlook, Spark, Edison...)
> Surely you can configure your email client to not to load remote content automatically, but most of the clients will still leak information in various html/css elements.
I believe MailMate does this by default? I've been using MailMate for a little over a year now and I've fallen completely in love with it.
Long time user of MailMate and was just about to ask this! I love MailMate for this privacy feature and ability to compose in markdown (P.S. - this is also my first HN comment ever)
I don't really see it this way. For many people, the only company of consequence that is tracking them is ultimately Google (and/or Facebook). Trackers that other companies install in their emails or websites are just sending the data back to Google in the end anyway. It's a redundant way for Google to capture information to build a profile of you with, but if you're using Gmail anyway, they don't need the extra tracking, they still get the same information.
That’s not true. Many advertisers and even newsletters try to figure out which of their emails you actually read and when, so they can optimize subject and date/time for better effect - e.g. emoji in subjects on Sunday get better hits with person X, finance data on Thursday evening with person Y.
They used to be able to tell where you were reading it from geoip, but google killed that by proxying all images through their servers as of a few years ago.
There's no setting in the app last time I checked but after this article I swapped on the ask to load pictures on Gmails webinterface and lo and behold now the Gmail iOS app ask me if I want to load pictures.
I probably should have added a disclaimer when posting this:
- "A while ago" is about a year ago. When choosing a new email client, run your own test and don't take my stale test results. emailprivacytester.com is fantastic.
- It is not an apples-to-apples comparison. I used Apple's native client as a pure IMAP client fetching directly from my email server, while I think many other apps want to pre-process your emails on their own server sides so they can providing timely email notifications without eating your smartphone's battery for background activity.
In case anyone is using protonmail and is curious about this: by default only DNS prefetch with the server's IP is leaked. Opting to load remote content leaks the reader's IP when grabbing CSS.
I think it depends on the software's targeting user group. This is okay, and probably the preferred behavior if your users are all tech-savvy. But it is hard to explain to non-technical users why this ugly text email is better than that that email with beautiful pictures, or even what HTML is.
The pictures aren't in the email. The email contains instructions saying “phone Steve and ask for the images, then put them in this gap”, but if your computer follows those instructions then Steve knows when you're reading your emails, and where.
Who is Steve? Nobody knows, but he's in the “knowing who's reading emails and when” business. It's a shady business. Don't let your computer phone Steve.
My email client / provider leaked only DNS prefetch... nothing else... Before I even opened the message! I reckon it was my provider, as the IP address reported was wrong for me.
Thanks. Tried with Postbox on macOS with my e-mail address and nothing gets leaked, unless I enable loading of pictures. This is with HTML e-mail on by default (which is why its surprising to me). FWIW, I prefer HTML e-mail off by default, but I lost that battle some 10 odd years ago when I quit using Mutt.
Those tracking links are so annoying. They make it hard to see where the link is actually going. A newsletter could be linking to Wikipedia, but if you open the message in Gmail, there could be two or more layers of trackers in that URL.
Example: The Frontend Focus newsletter in Gmail
The link of the first news headline is something like
So this is nice but doesn't do what I'm looking for, which is: given an email with links like OP's (https://www.google.com/...), go over it, follow each link to its final redirect and then replace the href with that (in this case, https://www.slashgear.com/...), letting me (for example) preview the links on mouse-over before clicking.
I thought it might be possible to build something like this with curl -Ls -w %{url_effective}, but seems like it only handles HTTP 302 redirects, whereas this site seems to be using Javascript, so you probably need a real crawler.
URL expanders may also be useful here, where expanding encoded URLs isn't sufficient.
I've found https://urlex.org/ useful (top DDG search result). You end up with the disambiguated link in most cases (Twitter, Bitly, and similar shorteners).
I've not looked to see how many levels of redirection/misdirection it will resolve.
What does thay actually achieve though? You've still given an 'opened' hit, even if urlex expands it on the server instead of client-side (which would be truly useless).
You're disguising location and device information, but that's about it?
- The "hit" comes from the resolver rather than your own IP. So long as there's no referrer pass-through of personal information, your location is minimised.
- Such links often come through other social media, in my case, rather than email. In the specific case of email this practice is of little use in protecting privacy. However if you're sanitising links pulled off social media shares or the like, you're at least preventing downstream contamination.
- Another practice is to randomly scramble any visible identifiers. This presumes longer URLs, rather than shortened ones.
- In practice, I scrub any "utm-medium" or similar URI attributes as a matter of course. URLEX is helpful for expanding shortened links ... which I've not encountered so much in email, though truth be told, I've largely abandoned email for numerous reasons, the present topic included.
At the least, a log hit from a different IP, I suppose. They're right, I totally forgot to mention the unshortener services, which are what I actually mostly use my Python routine (shared upthread) on. It's largely for self-amusement, admittedly.
For the Google links, I actually use an extension to automatically restore the original URL links.
I just had to deal with an annoying tracking link to unsubscribe from an unsolicited mailing list. uBlock even blocked the link click, I had to temporarily allow the tracker to unsubscribe.
Unsubscribe links have to have your identifier in them so you know who to unsubscribe when you click the link.
We used to ask people for their email address to unsubscribe them, but then they accused us of using a dark pattern to keep them subscribed. So letting people unsubscribe easier with fewer hoops to jump through seems like the lesser of two evils.
And that is in the emails. Now every social network / search engine modify the link on click so that you have the right link on hover but a tracking link once you click. Browsers should disallow this.
This investigation into email tracking attempts to deconstruct tracking links and pixels and highlight the data that is being collected. It covers Mailchimp, ConvertKit, Substack and other Mailgun retailers.
There's also some attempted (albeit unsuccessful) reverse-engineering of an opaque token in the Substack section (If you like reading stuff about reverse-engineering).
As far as insider info, most larger companies I've been at use a variety of confidentiality levels for their data, the highest of which cannot be emailed or put in the cloud. I believe that most corporate governance professionals are well aware of the risks and options for how to work with such things. But to be fair, your average office worker is not, so compliance with such policies becomes a cultural and education concern.
> Have you considered what Salesforce, HubSpot, and the like have? They use the BCC to record entire email chains and users...
But that's usually done to add "state" to emails so they can be tied to one thread in the support system and people can reply to either the email chain or via some web interface. I don't think you necessarily want to interfere with that.
There are privacy considerations on the unknown users side. Have I consented to HubSpot (et al) having PII and my email contents? (I don't know how this works today, with the GDPR, any future privacy laws)
I have more experience with the sales product than the support ones, where typically more sensitive information can be discussed.
There's also Litmus, which uses a really advanced set of multiple pixels to give data on how long a user is reading an email. Presumably, they insert delays into how long it takes to load each pixel, and if any of the requests get cancelled they can get an idea of how long the email was open for.
The Litmus pixels are usually dropped into another ESP's template, so the data you get would be used to supplement the normal tracking pixel email.
Is it done with the "loading" attribute[1] for the img tag? (i.e. lazy loading)
(in which case I assume it's only useful in some instances, since viewports might be of various sizes and there aren't that many emails that are long enough[2] to involve much scrolling for example.
Presumably the server just delays the response for x seconds, with the assumption that any in-flight network requests are cancelled by the email client when the user closes the window or app.
In general, email clients are really, really, really dumb. Everything gets loaded at once. So unless it was an HTML attribute that was available in the 90s, it's better to assume the magic is happening server side.
Here we are talking as if it’s the big companies that’s the problem.
The problem is their clients.
Your mom and pop store down the street sending out the weekly newsletter that helps keeps their business alive is the ones sending the mail that annoys you so.
The mail sending companies offered the feature of knowing when a subscriber opened an email and when they clicked on something.
So that tiny blogger who sends a weekly update in sub stack to subscribers eagerly awaits her click and open stats.
It’s hard for the likes of Mailchimp to pull back those features because their customers so rely on them.
How do I know? I write this kind of sending software all the time for thousands of these small customers.
We are talking husband and wife operations here. People who know nothing about email sending or what goes on behind it.
But take away their click and open tracking and you lose their business the next day —- that part — they know and want.
Add in the part of them knowing who opened and who clicked on what and it’s gosh darned magic for most small business owners.
PSA: (a) Disable automatic loading of e-mails in Gmail if you don't want to be tracked. (b) Don't ever click links from e-mails, Google for the content instead.
Settings -> General -> Images -> Ask before displaying external images
(I've also been debating sending an auto-reply back to users of such e-mail apps (e.g. Superhuman) with an autoresponse to the effect of "Due to the use of tracking pixels your e-mail has been de-prioritized. If you would like a faster response please send me a plain text e-mail" to discourage people from using these privacy invasions.)
Here's an question... Suppose I'd like to send emails that include images. The images are content, I don't care about tracking. Is there any way to do that in a way that's privacy friendly?
Are there any other options? The only other option I can see would be to use SVG images and then sort of "compile" the SVG into the html source. However, given how email clients have limited html support, this doesn't seem workable either...
It's frustrating that these tracking pixels have made genuine content images so unreliable.
Gmail proxies images, if you send everybody the same image you will get very little information about who is grabbing the image and when (i.e. you'll be able to tell when google (re)populates the cache which gives some small indication that your email is being opened).
This indeed prevents me from tracking. I should have been more clear that my "real" goal is that privacy-sensitive readers will be able to see images. I think these people won't know that the image isn't unique, and so won't load the images.
Tracking pixels and tracking links only work because there are unique identifiers in the URL. So if you just reference the image's direct link in the HTML of the email there's really no information to be gleaned outside of the normal email server handshake.
However, when Google proxies the image in an email, there is no way for the user to know the original URL and see if it has a unique identifier or not.
The email validation page is incorrect (possible due to being out of date). Apple Mail on iPhone can render embedded images just like Safari can. I use them in a few personal projects.
That article doesn’t impress me. Their remarks about CID image embedding are fairly incoherent and suggest they have largely just copied stuff from https://www.campaignmonitor.com/blog/email-marketing/embedde... and added a few bits of their own to avoid being dinged by Google for plagiarism, but didn’t really understand what they were adding.
> The impairment of this process upturns the email size while attaching an image that affects deliverability.
Well this is some nice word soup. I think that by the “impairment of this process” they mean “crafting the MIME message so the cid: URLs are all right”, which honestly isn’t that complex, and libraries tend to help you with it. For the rest of the sentence, I think what they’re trying to say is “using attachments for images makes the email bigger and may cause it to be rejected”. That’s… not a particularly reasonable claim. Also they don’t point out similar on inline data: URI images, which is poor of them. The fact of the matter is that the cid: and data: approaches will both use base64 or similar encoding.
> The CID email embedding method is not well applicable for browser-based email.
This is somewhere between mostly and entirely false.
If they mean webmail can’t read and display the <img src=cid:…> approach, they’re flat-out 100% wrong. It’s completely robust, supported absolutely everywhere that supports HTML markup.
If they mean webmail can’t author the <img src=cid:…> approach, well, that’s a bit more of a mixed bag. Some can, some can’t—and in some cases it depends on how you add the image (via an “insert image” toolbar button, via {dragging and dropping/copying and pasting} {an image/a remote image reference/some rich text including an image/some rich text including a remote image}, and several more—there are many ways, and some clients don’t intercept them all).
No, the real problem of the CID approach is that the image is an attachment, and although the client will almost certainly respect the `Content-Disposition: inline` header on the attachment and/or observe the fact that it’s used in the markup, and not show it in the list of attachments (or show it separately in some way), for mailbox search purposes it’ll almost certainly be included, and so queries like `has:attachment` will match the email. This makes the tempting idea of using this to put an image in your signature extremely problematic, because now it’ll be impossible to search for emails where you attached something, because every email has an attachment.
After using Firefox's HTTPS only mode I have noticed that quite disturbingly a lot of these auto-injected tracking links redirect through HTTP. I have seen nearly a dozen of websites that have this for password reset links.
It makes me wonder if it could be a viable attack to set up a WiFi hotspot, block login attempts so that some users think that they forgot their password (the error won't be right, but many users may try resetting their password anyways). Then you just intercept the HTTP tracking link and reset their password for them. Now you have stolen their account.
Of course you could just do this passively but prompting it by trying to fail login attempts would get you more hits.
One interesting thing I noticed with Linkedin emails is that it dynamically fetches unread notification count. For example, if someone views your profile, there will be a notification in the website. If you go to your mail and open an old Linkedin email before you check the notification in the website, you will see a little red 1 on the corner of Linkedin logo. Later, if you go to website, clear notification, and then open the same email, you will see that notification counter is gone. If find it quite interesting that Gmail lets this behaviour.
I'm assuming the server is just responding with a different image depending on a query param embedded in the image url? (an old technique), what should google do? any remote image url could respond with a new image in an old email it's just rare that it happens.
It used to prefetch external images [1]. Another option would be asking whether to download external images. I think one can enable this in settings, default is always display external images.
Yeah I always have all images disabled by default and turn them on on a per email basis if it's absolutely necessary. 90% of emails don't need them or just contain tracking pixels.
The image is dynamically generated at request time, so there isn't much Gmail can do, aside from eagerly preloading all images as soon as the email comes in.
As far as I remember, Gmail used to prefetch images to prevent senders learning if and when recepient opens an email, but if this behaviour changed, I didn't know that.
All Gmail does (or ever did) is proxy the image file so the server hosting it cannot do reverse IP lookup to collect client metadata like geolocation. The server hosting the image sees a Google IP address request the image, not (for example) your phone’s IP address.
But the image request still happens at the time you open the email. Google does not prefetch the images in unopened emails.
And if the image URL is personalized, it can still be correlated with your email address by the sender to record an open. Google does not try to guess which part of the URL they can dump without breaking the image.
No browser disables JavaScript by default, and disabling it is never a first-class feature: you have to manually figure out when it’s broken things and decide what to do with it.
Meanwhile, there are comparatively major webmail and desktop clients that disable remote image loading by default (e.g. Fastmail’s webmail and I think Thunderbird on the desktop), and all significant clients at least support disabling loading remote images. And in such cases, if any remote image is blocked, the client will put a “remote images blocked” banner with a button to load remote images. This is a first-class feature of email clients.
My impression is that Gmail prefetches ALL email images, and then serves them to the reader via their CDN. (Checking a random email in my inbox demonstrates this, https://ci3.googleusercontent.com/proxy/...)
As a result, I thought there was no signal for tracking pixels? I might be wrong though
They only know when google fetches the image, which can be any time between you receiving it and opening it. I highly doubt it's on the fly right when you open it.
All Gmail does is proxy the request to hide your IP from the server hosting the image file. Gmail does not change the timing of the request, the URL, or the image file.
Yeah. Something I did not expect when I became a mail administrator was meeting a lot of people who actually read those marketing newsletters I spend so much time trying to avoid.
I've got a constant contact sender (a local chamber of commerce) in my tickets right now who sends exclusively pictures of text.
if you were a large email service and you really wanted to mess with this sort of tracking could you
- fetch the images at the point the mail is accepted for delivery
- cache the result
- rewrite the URLs transparently in the UI to point to your cached copy
The majority of emails are never opened. So why would an email service greatly increase their complexity and costs by downloading images no one would otherwise ever see, storing them indefinitely, and rewriting their customers’ email content. The risk/reward ratio is way off on that.
I wonder how many customers would welcome the feature announcement “we are now programmatically altering the content of emails you receive through us.” Look how well everyone loved it when ISPs injected content into unencrypted web pages they delivered.
> The majority of emails are never opened. So why would an email service greatly increase their complexity and costs by downloading images no one would otherwise ever see
If gmail and some of the other large providers started doing this, people would just stop using tracking pixels because they would no longer work. So less stuff for gmail to proxy.
Then emails would only contain "legit" images, which would be shared across many emails. e.g, you send 100,000 emails with an image that has no tracking information, gmail only needs to downloads it once. And why would a sender choose to serve 100,000 copies of the same image from slightly different URLs, when they can just serve it up once?
The gains are obvious and would be large if you ask me. The scale of the costs, debatable, imo.
> why would a sender choose to serve 100,000 copies of the same image from slightly different URLs, when they can just serve it up once?
To provide open tracking, which is a core metric that all of their customers demand and rely on.
There is nothing special about a tracking pixel, it’s just a tiny image file with a personalized URL. Email marketing platforms could easily personalize the URLs of other image files or even all image files.
The costs are asymmetric. The sender only needs one copy of the image file, and a tiny bit of code to map the personalized URLs to that file. But the receiving platform would have to cache every copy of the image separately since they would all have different URLs. Or run some sort deduping scheme across all inboxes and emails, which would also be expensive.
We're talking about a situation where all images are fetched immediately on delivery regardless of the email being opened.
In that situation it does not "provide open tracking" any more. You send 100,000 emails with 100,000 slightly different URLs, then you get 100,000 images fetched. You get zero information about if the emails were opened or not.
So at that point, you stop putting tracking information in the image URLs, as it's no longer giving you any information, and just means you have to serve the same image 100,000 times instead of just once.
Now Google only has to do 1 HTTP request and store 1 image. It doesn't have to do 100,000 HTTP requests, and store 100,000 images in its cache.
Email inbox providers would have to incur 100% of the cost in a very coordinated way, and then hope that doing so bullies the senders into turning off their open tracking. It’s not going to happen.
Major inbox providers like open tracking because it is a tool for senders to improve their products and clean their lists, which ultimately reduces email volume and makes email recipients happier.
The people at big senders and big recipients talk to each other. If there is going to be a change around open tracking, it will probably be along the lines of a negotiated feedback loop like they have set up for spam complaints. Possibly with the inboxes charging the senders for the privilege of getting that feedback.
> Email inbox providers would have to incur 100% of the cost in a very coordinated way
Yes, and in the long term, providers like google for example, will probably end up saving a tonne of money by not having to proxy all these tracking resources.
> and then hope that doing so bullies the senders into turning off their open tracking. It’s not going to happen.
It's got nothing to do with bullying. Their "open tracking" would immediately become useless. The sender can leave it turned on, collecting no information, and using their bandwidth. Or they can turn it off, as they should never have been doing it in the first place.
> Major inbox providers like open tracking because
I don't care what mail inbox providers like. We shouldn't be taking that into consideration. Perhaps the postoffice would like it if people who put letters through my letterbox knew how much time I spent reading those letters. I don't care. They're not owed that information. Luckily they haven't found a way to abuse the postal mail system in the same way that email senders have.
I don't know how we get the big email providers to get rid of this plague of open tracking. Perhaps they will take it upon themselves at some point, due to pressure from their users, who want privacy. Gmail's already most of the way there now they've set up their proxying system.
No it's not. Gmail fetches images when you open an email to read it. You can test this yourself using https://www.emailprivacytester.com.
The only thing Gmail does is hide your IP when it fetches the image. It doesn't hide the fact that you've opened the email. Which frankly, is the most useful piece of information to the tracker.
The email privacy detector is seeing Google fetch the image. In the HTML of the email the sent me the image URL points to a Google proxy link.
On subsequent opens of the email, the detector is not seeing the image being requested again.
Unless you were proposing the email server should download and proxy ALL images, even before the email is delivered. Some anti-spam clients already do a version of this, although it should be noted that giving an email sender the signal that you are eagerly reading all of their emails may produce unintended consequences.
> Unless you were proposing the email server should download and proxy ALL images, even before the email is delivered.
That is precisely what the OP proposed, and which you then stated is the way that gmail works.
I then pointed out that gmail does not work that way. And you have now confirmed that.
> giving an email sender the signal that you are eagerly reading all of their emails may produce unintended consequences.
That's the whole point of this. The moment the big providers implement "fetch on delivery", there is no signal any more. The spammers wont suddenly think, "oh look, our spam campagain is going swimmingly. 100% of our gmail, hotmail and yahoo recipients are now opening our email", and then continue along oblivious of this new major change from all the main email providers, thinking that all of their email is being opened.
I don't think they block link tracking clicks. I don't see how they could possibly do that.
(Unless they take what I've discovered and incorporate it into their system. Even then, it wouldn't be 100% coverage. Some tracking links, ie. Mailchimp, can't be avoided.)
A while ago, I used https://www.emailprivacytester.com/ to test several famous iOS email clients, and most of them more or less leaked _something_, even without loading remote content. In the end, I found Fastmail and Apple's built-in iOS mail client to be the top-notch in terms of privacy (Fastmail leaked nothing but only their server side DNS server via DNS prefetch[1][2], which has nothing to do with client. Apple is slightly worse, but still far better than any other email clients like Outlook, Spark, Edison...)
1. https://www.emailprivacytester.com/testDescription?test=dnsL...
2. https://www.emailprivacytester.com/testDescription?test=dnsA...