This is a follow-up to Gmail putting PayPal classifying PayPal security notification emails as Spam (https://news.ycombinator.com/item?id=19099887), and Google stating that it's a problem with Gmail and that they've put in a temporary fix specific for PayPal.
Just now, 2 months later, the same issue happened with a security login email from Stripe.
Honestly, the first time I saw a Paypal email (something along the line of “check out your account activity”) I thought for sure that it was a phishing attempt. Not one link in their emails point to PayPal, they all point to stg like paypal-communications.com. You know, the type of domain name a phisher would come up with. Even the link talking about security doesn’t point to the PayPal website! That, and asking me to check my recent account activity when I know I haven’t used PayPal (and PayPal should also know that), is basically PayPal training their customers to fall for scams.
One of the reasons I think spammers have been so successful has to do with companies only.
Because companies send a LOT of spam emails / promotions etc that they have to come up with alternate domains to send emails. On customer side its very difficult to know the truth.
That feels like people are marking PayPal/Stripe emails as spam to make them go away in their mailboxes. At least I've seen a lot of less tech savvy people use "Mark as spam" button as "Make this go away" button and train the model into throwing these kind of emails away.
I'm pretty tech savvy, but if any company sends me umwanted emails that don't have an unsubscribe link at the bottom which I can access without logging in, I mark it as spam, because that's what it is. If enough people do this, Gmail learns and applies the filter accordingly soon enough.
Paypal is sending regular "HEY YOU HAVE MONEY IN YOUR ACCOUNT" e-mails. Apparently, there are legal requirements to send them, so you can't opt opt. I'm not surprised if people send the stuff to spam, because it looks exactly like the typical "hey pay attention to me" re-engagement notification spam.
And I don't see why you'd need to be less tech savvy to use "mark as spam" on automated e-mails sent to you that you didn't ask for and don't want to receive.
Gmail has become ridiculously aggressive at spam filtering, and not in a helpful way at all. As I posted here some time ago, they're spam filtering government agency communications sending authentic mail from .gov -- which is really frustrating. It's out of control.
It's not just Google, sadly. We have a noticeable level of problems with big name mail services and major ISPs silently dropping messages, including messages we are legally required to send. This remains the case even though we're sending through reputable mail servers, SPF is set up, etc.
The likes of Google and Microsoft have applied embrace-extend-extinguish to possibly the most important communications medium of modern terms, successfully making it less reliable than it has ever been for sending legitimate, actually important stuff. They really should be strongly and publicly criticised for that. False positives in spam detection are, for the most part, more damaging than false negatives.
>Just now, 2 months later, the same issue happened with a security login email from Stripe.
It just happened for the same user, but it seems to be a long standing issue. In the old thread another user was complaining about Stripe emails getting the same treatment: https://news.ycombinator.com/item?id=19100629
The "via" next to the sender address in your screenshots (https://support.google.com/mail/answer/1311182?hl=en) suggests that there might be a DMARC problem (on their side or yours) that could be causing this filtering.
As mentioned there, the "via" most likely comes from the fact that the recipient address is a Google Groups address that multiple people can receive.
Also, as mentioned in the Github issue, Google's G Suite security staff has checked the headers (in the Paypal case) and come to the conclusion that nothing is wrong with them, and that it's a Gmail issue (and that the fact that the recipient is a Groups address should make no difference for spam filtering).
They provide the headers of the classified message and it seems to have passed through some kind of group expansion; the message provided did not come _directly_ from Stripe. Initially it was received from from a62-28.smtp-out.us-west-2.amazonses.com but then it went through a listserv, through another SMTP hop via gappssmtp.com, then delivered to the spam folder of the recipient.... There could be a million reasons why this happened but it's equally correct to say that "Gmail spam-filters Google Groups" as it is to say "Gmail spam-filters Stripe" based on this evidence.
I have almost never found anything of interest in spam for many years. Maybe none, i can't remember a single instance.
In the early years, yes there had been a few.
I was in the same boat (and checked periodically just in case), but I recommend that you re-check. In this past year, something must have changed substantially, because there are a lot more false positives. I just went and looked, and about 1/3 of what was there wasn't spam, but stuff like e.g. Kickstarter campaign notifications, and even a car manufacturer defect notice.
This has been my experience as well, but even with only 2 accounts it takes several clicks and quite a few seconds of waiting to check all of my email because it's also fragmented with the 'social', advertisement' and 'main' tags.
> even with only 2 accounts it takes several clicks and quite a few seconds of waiting to check all of my email because it's also fragmented with the 'social', advertisement' and 'main' tags.
I've kept those "tags" turned off on every Gmail account I control — I'd rather not risk having something important enter "social" or "promotions".
The overwhelming majority of spam rejected by Gmail is rejected at SMTP time, so checking your Spam label isn't much of a solution if you don't have confidence in their classification. A very small fraction of spam is delivered to that label.
Even if you did there would be so much junk mail you would never get through it all. That's why they have the extra "remove immediately" tier in the first place.
Hm, I have a personal domain with an email record pointed to Google. Can I route emails sent to those addresses through something before forwarding to Google so I can see the firehose?
Of course. Point your MX record wherever you want. Gmail does not accept mail based on whether the MX record points to Gmail, it accepts mail based on whether the domain is a customer of Gmail or not.
What really annoys me is that (with the web version at least), you need to click a "More" link to even see how many spam emails you have sitting unread, so I don't get a visual reminder to check it occasionally :/
Gmail has never been usable without turning off the spam detection. They make it kind it a bit cumbersome, you have to create a filter that overrides it.
I mean a lot of companies are mixing spam and critical messaging -- square is an example (every place I paid with square is now sending me marketing).
It kind of sucks that we still use email for password reset and billing stuff. It's like the digital equivalent of certified mail with most of the same problems.
This is the problem. Both of these companies are filthy spammers and are using the same domains and networks for transactional mail as for spam. They've created their own reputation issues. The fact that someone from Gmail spam ops kindly offered to reach in and manually tweak the reputation for PayPal doesn't mean this is a problem with Gmail.
I just checked my spam folder and there is an email from YouTube in there. (It is a notification that I subscribed to telling me that one of my favorite channels posted a new video.)
I'm confused. How is it in 2019 that financial services companies, in particular ostensibly internet savvy companies, aren't using DMARC? How in the world can authenticated legitimate emails ever end up in spam?
And surely by now Google spam filtering has a pretty good idea what emails can be categorized as "financial institution" likely, and if they aren't conforming to an agreed upon DMARC policy to put those in spam?
Or even better, a mandatory agreed upon tag that indicates the content is 'financial institution account security notification' related (i.e. not marketing spam from that institution), but any emails that use that tag but don't also use the agreed upon DMARC policy always go to spam.
And they do. For a time I got a large amount of spam from dozens of cheap domains registered at Namecheap (which sold .bid domains for 0.99€). Additionally, each had a valid DKIM signature.
I notified Namecheap, including the DKIM signatures, but they told me they couldn't do anything about most of them because "they are hosted with another company". ¯\_(ツ)_/¯
Though if the registrar deleted quickly these domains AND didn't make them so ridiculously cheap, maybe it would become too expansive for spammers.
I also don't really get why we don't apply spam filtering retroactively a few seconds/minutes after it arrived. At first you don't know the domain yet, but at some point you can go "oh, this domain now reached more than 1000 different inboxes for the first time, and of the 50 users who saw the message, 45%* marked it as spam. Must be spam, let's move these and future messages to spam folders for all who haven't downloaded/opened the message yet." There is so much spam that is (nearly) identical and reaches a large percentage of people, but it's just left in the inbox for years by outlook/gmail/etc., even if I don't log into the account for months at a time.
If the spammers can only reach an initial small sample and the domain is next to useless after that, even 99ct domains should not be worth it.
* Or whatever is a normal number. I know lots of people just leave the message as 'read' and don't bother marking it. I don't know how many users do this. Maybe one could also keep track of users who regularly mark something as spam and only count the percentage among those.
Gmail and Yahoo will do retroactive classification to some extent, but not broadly. It's a little more common to defer delivery by varying amounts based on historical sender reputation, but again that impacts a minority of legitimate email - maybe 10-20%-ish, off the top of my head - and mostly only by a few minutes.
And yes, every major consumer email provider tracks complaint and response rate metrics (as well as many other metrics and indicators) and uses those as part of their filtering. A spam ratio of > 2% is often enough to cause filtering - that's actually toward the very top end of the complaint rate spectrum for messages that are delivered to the inbox.
Spam filtering already relies on throttling. If you're a domain with unknown reputation, you're generally forbidden [by the recipient] from sending a lot of email until a reputation is established.
This suggests that a weakness of the global email system is being exploited by spammers: receiving mail servers aren't gossiping enough information about new mail sending domains.
Presumably mail server operators are reporting obvious spammers to (centralised) blacklists, but it would perhaps be possible to better tune a heuristic (and increase the cost to spammers) by sharing information on the number of non-spam messages received.
This could actually be done in a provable and relatively privacy-preserving way, if sending mail servers included signatures of the hashes of the emails they were sending. Every email that was received by a domain of unknown reputation could have its hash+sig sent to a public distributed log somewhere.
If this was combined with some sort of good-behaviour bond that domain registrars required (for domains that send email) and which was paid back after a reputation was established, it would make cheap domains much more expensive for spammers.
This is tricky though. You don't want to make these heuristics public, or spammers will just access them to switch domains, and if you say "you can't see our reputation list unless you're Gmail, Yahoo, or outlook", that can start to look a lot like collusion.
You're right, but my hope is that forcing spammers to switch domains will increase their costs to the point that spamming is no longer financially viable.
If the cost of domains is already such a significant expenditure that they need to look for sub-one dollar registrations, then requiring, say, a $10 bond on all domains with an MX record might erase their profit margins completely.
(There is a question of what constitutes "good behaviour" and whether that can be gamed by having spam domains reporting each other as sending legitimate email, but if these ratings are public then people can choose which ratings to trust. Domain age would probably be a good heuristic there too.)
I think that's how spam filtering started for large providers (in addition to classic keywords and such). But the temptation to do something smarter is strong, and as usual, "smart" things end up being wrong.
An issue with this is method is that company A may subscribe to company B's mailing list en masse, and then tag its mails as spam, causing all mails from B to be classified as spam. (Maybe that's what happened to get Paypal and Stripe banned from Gmail, who knows.)
You're right; complaint rate and similar metrics tracking were implemented by major consumer email providers in the early 00s, if I recall correctly. The technique has been around long enough for those providers to have systems in place to control for edge cases like the one you mentioned - there are enough subtle behavioral differences between an average user reporting something as spam, vs. a deliberate 'complaint brigade', in order to be able to discern reasonably well between the two - or at least, to reduce the impact of the latter. Users marking 'this is not spam' for messages that mistakenly land in spam is a key metric as well, and is part of how a lot of senders recover from short-term spam folder delivery.
Whether or not additional complexity is justified is something that can be measured in a case like this - and that's exactly what Gmail does, using a variety of different metrics, which allows them to make an informed decision about whether to take on the additional complexity cost.
Should they weight false positives for this message type more heavily in their accuracy metrics? Definitely, these are particularly critical to classify correctly.
That's one of the reasons why greylisting used to be effective; when a mail-delivery is attempted you give it a soft-fail, disconnecting the sender.
The expectation is that by the time they try to deliver a second time you'll get "spam" results if you query DNS-based blacklists, etc, as other people have reported it already.
These days greylisting doesn't seem to be so useful, as >50% of the spam I receive is sent from gmail/yahoo/similar. Hosts too big to block (sigh).
It's less about the presence of DMARC in general, so much as the fact that the presence of full DMARC alignment (even in the absence of a formal DMARC record) allows ISPs to make more confident assessments of that sender's reputation.
It makes it much harder to subscribe to new services [1]. Or when existing services mess up their config [2], your mailbox will silently drop the messages.
[1] eg. today I got a mail from IKEA from an address @ods2.net instead of the usual @ikea.com one; I had no idea I would have had to whitelist
[2] Which happens and will continue to happen as long as most mailboxes are not configured to drop messages without a valid DMARC. Chicken and egg problem.
DMARC already allows a domain owner to specify a recommended action (no action, quarantine, or reject) to take in the event an ISP receives a message that fails DMARC. This DMARC policy can be published at the subdomain level, and best practice is for organizations to use separate IPs/subdomains for sending different types of mail traffic (security notifications vs. marketing, for example), so security notifications can have a significantly stricter policy than marketing mail.
In the paypal example, the 'via' in the from address line is an indicator the message is not DMARC aligned (but doesn't tell us whether or not the actual originating domain publishes a formal DMARC record - I can't tell that since it's blacked out in the GH post).
So for the paypal example, you're right, it looks like they're most likely not following best practices. It may be possible that the 'via' domain is simply a different legitimate paypal domain, which is an edge case that DMARC isn't designed to cover - I think we'll see a technical standard emerge in the next 5 years to handle that kind of thing; maybe built on some of what BIMI is doing - https://www.ietf.org/id/draft-bkl-bimi-overview-00.txt
I can confirm that there are some big problems with Google spam filter. Even emails from Google G Suite support ended up marked as spam. That started happening like 4 months ago.
We had a problem with following emails:
- all emails from sns.amazonaws.com are in spam
- all Stripe emails about "Suspected fraudulent payment on your Stripe account" are in spam
They all say "Similar messages were used to steal people's personal information. Avoid clicking links, downloading attachments, or replying with personal information." Anyway, if you depend on G Suite email to receive important notifications from PayPal, Stripe, or Amazon you will need to monitor your spam folder. It seems like there are some keywords (invoice, payment - I really do not know.) which will trigger spam filter.
A good chunk of emails I get from YouTube go to spam. I have notifications on by email for when some youtubers post new videos: https://i.imgur.com/8E0oGrP.png
However some of them do not get spam filtered, only a chunk of them do. It's weird because these come from YouTube, so it's a similar problem where Gmail is filtering Google emails....
This is a huge problem with Gmail. They've been filtering legitimate receipts and things from Gumroad in the last couple of months, and I've heard similar things from founders of several other companies.
Please check your Spam folder, and mark the legitimate emails as such.
Hopefully it'll help the folks (or AI) over at Gmail get better!
Just the other day, I posted a similar question [1] here, after discovering that even gmail-to-gmail messages I send to my close friends get categorized as spam. There has been a significant regression in Gmail's spam filter recently (both false positives + negatives).
Google recently filtered some emails into spam for us. Those emails were sent by Google Docs, in response to us adding comments to our own private doc. This was all within the same organization.
I've also repeatedly seen them suddenly spam filter individual messages in long threads of conversation. Like they weren't taking into account that a message was part of a two way conversation at all, even though they have In-Reply-To and Reference headers linking them to those non spam filtered conversations.
They also randomly spam filter transactional emails that we send from our servers that are from our own domain, to the same domain as hosted by GMail, that is fully validated by SPF, DKIM and DMARC. That should never happen. We clearly trust our own domain, so if mail from that domain is authenticated, let the bloody email through 100% of the time!
It's like they spent all their time working on how to blacklist email, and zero time working on when to whitelist it.
The most important part of a spam filter is the whitelisting.
I guess Stripe is big enough to notice for them. But imagine tiny website. Is there any chance to reach gmail developers in case of similar failure? Gmail just makes e-mail an unreliable transport.
It's not the best solution in cases of clear false positives, but they do provide a help document outlining what kind of practices result in reduced risk of these kinds of problems. (It's oriented toward high volume senders, but many of the recommendations apply at any scale.)
Their postmaster tools don't show anything up at low volumes. I'm a freelance consultant, and my mail volume is very low. Usually setting up contracts, and sending out invoices. Not much else. My mail volume is too low to show up there.
I don't send out bulk or promotional emails, I use SPF and DKIM, and I've still seen a lot go to spam. My mail is hosted by FastMail, not a home server.
I recently got a paper statement from my bank (I'm signed up for online statements only) because the bank claimed my gmail inbox rejected its notification email. (I checked spam, and it didn't land there either.)
There's an ever-growing list of stuff you need to set up to be considered a legitimate mail server. It started long ago with a valid reverse DNS entry, and now you need DKIM, SPF and a bunch of other stuff: https://support.google.com/mail/answer/81126?hl=en
Given that banks seem unable to send e-mail that doesn't look like phishing (using non-https links and often separate domains), I don't have high confidence that they can get the basics right.
I am an freelance consultant, and a number of times I have seen my business email (hosted at FastMail) go to spam in a Google hosted mailbox. As a single consultant, I don't send a huge volume of mail because a number of my clients set me up with their internal email for business communication. The mail that I do send is invoices, and setting up contract signing; that sort of thing. The mails are always expected communication. I have SPF and DKIM configured and validated by FastMail. Mail somehow still goes to spam.
I set up with google's postmaster tools, but it shows nothing because I don't send a big enough volume of emails for it to care.
I set up dmarc to get reports when mail is delivered under my domain to a GMail or G Suite hosted address. Every single dmarc report (it's easy for me to keep on top of them because my mail volume is that low) is "spf: pass; dkim: pass".
I really don't like that the only way to get this looked at is through publicly shaming Google like this; I have not seen any way I can talk to somebody at Google to solve the problem. FastMail support said they were unable to help me solve it either, suggesting I need to send mail into the Google garden and have the recipient click "not spam" until their filter learns.
I think it is not up to "SPF and DKIM configured". I believe is because of you have some "bad" keyword invoice, money, etc. in the body of the email causing this.
I assume that is the problem because when I reported to Google support about "why this Stripe email is in spam" and I provide them with email body. Then reply from Google G Suite support went to spam (since it had email body in it).
And of course the fact that you use FastMail does not help.
For a few years now I have Gmail’s spam filter disabled completely on my accounts.
I’m not sure you can do that with a setting, but there is a workaround where you create a filter that excludes messages matching some random UUID, and tick the “Never send to spam” action.
So far to me the trade-off has been worth it (YMMV). Not a huge burden in terms of junk mail influx in my inbox, but one does have to be on alert for high-quality phishing when reading mail (check headers, for example).
That is exactly what I do (and what results in the "This message was not sent to spam because of a filter you created" message in my screenshot): Create a filter that maches all and selects "Never send to spam" as action.
Is this a Gmail issue or a G Suite issue? You'd think they're similar but they haven't been in my experience. G Suite spam filtering has always sucked for me and others I know. I've gotten completely legitimate replies classified as spam from completely normal and legitimate people in the same domain as far back as 2013, if not earlier. In the case of someone else I know this happened to, it was an extremely critical email that got classified as spam.
P.S. If any Gmail folks are here: There's a bug in Gmail/G Suite that might as well be equivalent to marking an email as spam, since it prevents you from seeing emails: last time I checked, the "Important" section of Priority Inbox seemed to sort based on the timestamp of the most recent important email in the conversation, not the most recent email in the conversation. So if a subsequent email comes that isn't Important, the conversation doesn't pop up to the top -- so if it's not on your screen already, you don't see it as unread.
A lot of large enterprises using GSuite route their mail through external systems that provide various dubious services and which often screw up the spam classification by having dirty VIPs or being open spam relays etc. Perhaps your organization was infected by such a thing. Otherwise having a threaded message or a sender in your address book is a virtually guaranteed saving throw against spam classification.
You made me go back and check the email I had in mind. I'm not sure to be honest. What I do see is that it soft-failed SPF:
> Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning {user@domain} does not designate {IP} as permitted sender)
So I guess you can use that to blame the domain owner, but I feel like the fact that it was a reply to my own email should've been enough to allow it through? Moreover, there are other emails around that time that weren't replies but still softfailed SPF, and they were still let through.
... in order to send outbound mail from Gmail effectively. Many people don’t realize that Gmail sends mail to itself over SMTP, not some special magic protocol, and it doesn’t exempt itself from SPF checks and whatnot.
It's a Google product. Of course it limits your communication with Google competitors. If you used Google Pay, like you're supposed to, it would work fine. It's a free email system.
They have to make money off you somehow. What did you expect?
Isn't that what content scanning is supposed to check for? I mean, it's not like they're not scanning the email body for keyword extrapolation, anyways, right?
To be fair, though, Stripe should change their SPF record to hard-fail (-all).
With a proper SPF record, we can verify whether or not they came from the right servers. Google verified that PayPal's e-mails were legitimate.
You can't spoof those addresses convincingly unless you compromise a legitimate mail server listed in the SPF, or hijack BGP to impersonate a legitimate mail server in the SPF.
Google is filtering known trustworthy senders for dubious reasons.
Just now, 2 months later, the same issue happened with a security login email from Stripe.