I also don't really get why we don't apply spam filtering retroactively a few se...

massaman_yams · on March 31, 2019

Gmail and Yahoo will do retroactive classification to some extent, but not broadly. It's a little more common to defer delivery by varying amounts based on historical sender reputation, but again that impacts a minority of legitimate email - maybe 10-20%-ish, off the top of my head - and mostly only by a few minutes.

And yes, every major consumer email provider tracks complaint and response rate metrics (as well as many other metrics and indicators) and uses those as part of their filtering. A spam ratio of > 2% is often enough to cause filtering - that's actually toward the very top end of the complaint rate spectrum for messages that are delivered to the inbox.

jcranmer · on March 31, 2019

Spam filtering already relies on throttling. If you're a domain with unknown reputation, you're generally forbidden [by the recipient] from sending a lot of email until a reputation is established.

dane-pgp · on March 31, 2019

This suggests that a weakness of the global email system is being exploited by spammers: receiving mail servers aren't gossiping enough information about new mail sending domains.

Presumably mail server operators are reporting obvious spammers to (centralised) blacklists, but it would perhaps be possible to better tune a heuristic (and increase the cost to spammers) by sharing information on the number of non-spam messages received.

This could actually be done in a provable and relatively privacy-preserving way, if sending mail servers included signatures of the hashes of the emails they were sending. Every email that was received by a domain of unknown reputation could have its hash+sig sent to a public distributed log somewhere.

If this was combined with some sort of good-behaviour bond that domain registrars required (for domains that send email) and which was paid back after a reputation was established, it would make cheap domains much more expensive for spammers.

joshuamorton · on March 31, 2019

This is tricky though. You don't want to make these heuristics public, or spammers will just access them to switch domains, and if you say "you can't see our reputation list unless you're Gmail, Yahoo, or outlook", that can start to look a lot like collusion.

dane-pgp · on April 1, 2019

You're right, but my hope is that forcing spammers to switch domains will increase their costs to the point that spamming is no longer financially viable.

If the cost of domains is already such a significant expenditure that they need to look for sub-one dollar registrations, then requiring, say, a $10 bond on all domains with an MX record might erase their profit margins completely.

(There is a question of what constitutes "good behaviour" and whether that can be gamed by having spam domains reporting each other as sending legitimate email, but if these ratings are public then people can choose which ratings to trust. Domain age would probably be a good heuristic there too.)

progval · on March 31, 2019

I think that's how spam filtering started for large providers (in addition to classic keywords and such). But the temptation to do something smarter is strong, and as usual, "smart" things end up being wrong.

An issue with this is method is that company A may subscribe to company B's mailing list en masse, and then tag its mails as spam, causing all mails from B to be classified as spam. (Maybe that's what happened to get Paypal and Stripe banned from Gmail, who knows.)

massaman_yams · on March 31, 2019

You're right; complaint rate and similar metrics tracking were implemented by major consumer email providers in the early 00s, if I recall correctly. The technique has been around long enough for those providers to have systems in place to control for edge cases like the one you mentioned - there are enough subtle behavioral differences between an average user reporting something as spam, vs. a deliberate 'complaint brigade', in order to be able to discern reasonably well between the two - or at least, to reduce the impact of the latter. Users marking 'this is not spam' for messages that mistakenly land in spam is a key metric as well, and is part of how a lot of senders recover from short-term spam folder delivery.

Whether or not additional complexity is justified is something that can be measured in a case like this - and that's exactly what Gmail does, using a variety of different metrics, which allows them to make an informed decision about whether to take on the additional complexity cost.

Should they weight false positives for this message type more heavily in their accuracy metrics? Definitely, these are particularly critical to classify correctly.

stevekemp · on March 31, 2019

That's one of the reasons why greylisting used to be effective; when a mail-delivery is attempted you give it a soft-fail, disconnecting the sender.

The expectation is that by the time they try to deliver a second time you'll get "spam" results if you query DNS-based blacklists, etc, as other people have reported it already.

These days greylisting doesn't seem to be so useful, as >50% of the spam I receive is sent from gmail/yahoo/similar. Hosts too big to block (sigh).

teddyh · on March 31, 2019

If a measure becomes a target, it ceases to be a good measure. (Goodhart’s law)