Head of Let's Encrypt here. Our automated mail system had a bug that accidentally exposed about 1.9% of subscriber email addresses to the same 1.9% of recipients.
Our sincerest apologies for this mistake. We will be doing a thorough postmortem to determine exactly how this happened and how we can prevent something like this from happening again.
Good to notice that the new EU data protection regulation already has influence on the discourse even though it shall apply from May 25, 2018. And more importantly, even though Letsencrypt is a US based organization it applies directly or indirectly.
For those interested, I can highly recommend the documentary Democracy: Im Rausch der Daten (2015) [0]. For 9 month the documentary follows MEP rapporteur Jan Philipp Albrecht and his policy advisor Ralf Bendrath [1] (hacked a C64 in his youth, frequent attendee at C3) and gave a rare inside in the negotiation & process in Brussels. They even recorded the final negotiations in the backrooms and for those knowledgeable on the topic it is very interesting to see how some of the deals where made. After the screening at IDFA the director said he wasn't onset to make a documentary on the EU data protection reform and that Jan Phillip's rapporteur topic was chosen as an example EU process. Regretfully this documentary isn't exemplary for the EU process because most of the time the lobbyists of the large corporations have a far larger influence. The documentary clearly shows that the final text has been the result of the persistence & integrity of the the 3 main characters Albrecht, Bendrath and Redding. And as we're experience now, this has great influence on data protection for more than only Europe.
In case you're referring to the General Data Protection Regulation, it does not apply until May 2018. The penalty that would be applied would probably be this one, given that the number of prior offenses, the gravity and duration of the offense, the number of affected users and the damage caused, etc. need to be taken into account:
> Each supervisory authority shall have all of the following corrective powers:
> [...]
> (b) to issue reprimands to a controller or a processor where processing operations have infringed provisions of this Regulation;
Most people would consider their possibly/definitely having HIV to be a more personal piece of information than their email address.
If I'd been affected by this Let's Encrypt email thing I frankly wouldn't care, other than to question why it happened. (Just speaking for myself, I'm sure some people would care.) But if I was data in a leak revealing a personal serious medical issue, I absolutely would care.
Data protection is already regulated by the EU (only by means of a Directive rather than a Regulation) so the same principles apply across the EU/EEA, if not the exact rules.
For sharing a few email addresses? $0 I'd imagine. It's not like its anything important like credentials for shopping/banking or other details which could be used in identify theft. Worst case scenario; google's spam filters have to work a little harder. You'd not even notice. Yes, some people have chosen to run their own mail servers for some reason and those people might conceivably get a bit more spam for a while.
You guys should all switch to disposable email addresses. It solves so many problems. If your address is leaked like that, no big deal, you just delete that alias. If a password is leaked along, again having website specific emails will make it very difficult to correlate your credentials with another website. If you start receiving spam on that address you just delete it. And so it ceases to be personally identifiable information.
I actually start to think you should, without joking. At least the name.
If you look at the past 5 years, there is almost not a single major website that hasn't been hacked and hasn't leaked personal data. Not only do I see no sign of improvement, but it is rather accelerating. Leaking information on 30m+ people is now becoming common and barely makes the news outside of a few specialized websites like HN.
If you have a better alternative than feeding garbage data to websites who want to collect data they won't need (why would an online retailer give a shit that they are shipping a product to someone called Mr X rather than Mr Y?), I want to hear it.
The reason for this screw up was guessed by a Twitter user, and his theory was confirmed by Josh from Let's Encrypt [1].
The whole mess was caused by the Python `email` package, and specifically the behavior of the `MIMEMultipart` object [2]. When you reuse the same `MIMEMultipart` object for multiple emails, each destination address is appended. The same problem takes place when you use Python 3 [3].
I see that it's confrimed, but find it a bit odd that they had originally said "prepended between 0 and 7,618 other email addresses to the body of the email.", as this way it would be just a lot of "To" headers.
Since both users mentioned were the last in the list of addresses for the email they received, my money's on a trivial mistake like:
getEmailBody(users[:i])
instead of
getEmailBody(users[i])
I typically prefer a high level of polymorphism in my code/APIs (sensibly handling single inputs vs. arrays) but this is a great counter-example even if not the actual root cause. Every feature is also a liability. Double edged sword. Etc.
Nothing about your suggested solution requires or is exclusive to a statically-typed language. "Replace a send-to-arbitrary-number-of-addresses function with a send-to-exactly-one-address function" is possible in dynamically-typed languages, too, and as you openly admit even a statically-typed language is likely to model email sending in a way that accepts multiple recipient addresses.
So your "type system can really help" is really just an irrelevancy you've come up with to try hide your attempt to shove your preferred programming paradigm onto other people.
> So your "type system can really help" is really just an irrelevancy you've come up with to try hide your attempt to shove your preferred programming paradigm onto other people.
Not exactly, but it is a weaker advantage than I thought when writing it. The advantage is that languages like Haskell encourage specializing functions in that manner which avoid that specific bug.
However so does test driven development which is just as possible in dynamic languages.
The advantage the statically typed language has over even the dynamically typed language plus tdd is that the program won't compile whereas dynamic language plus tdd relies on programmer discipline.
i thought of that as well.Went and looked at my mail list programme in Racket, and the mail procedure takes many recipients, but in the form if rest arguments, like: (define (send-mail msg . recipients) ...)
passing it a list makes the recipients arg a list with a list of recipients. To actually send to multiple recipients I explicitly have to use the apply procedure that passes all list elements as arguments.
as you said, not a staticly typed language. types would have made an eventual error easier to debug though.
Well, using the type system, weak or strong, to force correct code either at compile time or runtime would help. I can write a function that uses Java reflection for polymorphism as easily as I can use JavaScript reflection to enforce an API contract.
// JavaScript
if ( Array.isArray(address) ) {
throw new Error("Must only pass a single email address.");
}
// Java
void sendEmail(Object recipientOrRecipients) { ... }
There are times when polymorphism is low-risk, and there are times when it's better safe than sorry. Best to know your risk model (and your libraries) and act accordingly.
I fixed a very similar bug in an ASP.NET site a few weeks back; email generation had a factory class where each time it wished to construct an email it would set From = this, Subject = that, Body = the other, but To was a list, so it wrote To.Add(user.Email), without a To.Clear() call at the start. So the first email went out to one person, the second to two, the third to three… someone else had made the site and on receiving a report of it once I had briefly looked through the code and missed it, then when another client complained of the addresses plus having received sixteen emails I looked again and realised what was going on.
as far as I know, all emails starting with 0-9, A-Z and at least part of 'a' were exposed. I did not get one starting with 'g', so it's somewhere between 'a' and 'g' that it got stopped.
Edit: "7,618 out of approximately 383,000 emails" were sent out
Was just able to confirm, it's up to and including your email address. Mine starts with m so I see 3,761 email addresses. But for me, none lexicographically after my email address are exposed.
Edit: Just want to add that I've made a similar mistake before (with a smaller user base). So I understand how easily these bugs occur. Given all their progress in the last few years, I still believe that the privacy and security of such a large portion of the Web could not be in better hands. Props to the LE team for a quick, responsible response.
You mean M, not m. (it's in ASCII order). Also giving out the number of addresses you see will allow someone after yours to connect your username with your email address if you weren't aware.
Or, perhaps it's just been too long since I've used it actively? Sounds like parent did in fact have their email at least in the body somewhere, even if they didn't get one sent to them. Perhaps I'm the same.
It sucks this happened but I don't really care. You guys are providing such an amazing and sorely need service I have no problem cutting you some slack. I hope others will too. Of course those working for companies who's lunch you're eating will likely run with this as far as they can.
Interesting that since the list of addresses was sequentially prepended to (if I understand the wording of the notice correctly), anyone who anonymously shares the list will incriminate themselves ,though to a smaller and smaller pool of peer customers.
A simple solution to this would be to chop off an arbitrary number of addresses prior to disclosure. The first person can leak any number of emails, and the last can only leak one.
The list of addresses was prepended to the email, but the addresses were added to the end of the list itself. Thus, every recipient saw their own address as the last item in the list.
The Hyatt hotel in Switzerland did a similar thing a few weeks ago. They sent a mail shot to everyone using the CC function not BCC. I complained and their response was that they'd recalled the mail so 'that was that'. Of course a recall means nothing to the hordes of gmail addresses,etc. that the mail shot was sent to. It's a common problem and a big incentive to use throw away addresses.
The new head of the IIA (Irish Internet Association) did a cc on the entire membership just a few days ago announcing her arrival. Felt pretty sorry for her. She actually did the bcc correctly the first time but forgot the attachment then correcting that she did a cc.
I once asked to be removed from a list and suggested they use CC instead of BCC. I accidentally did so by way of Reply-All. Boy did I feel stupid. For days. While everyone kept replying to me.
For the curious this was the content of the email. Pretty generic.
"Dear Let's Encrypt Subscriber,
We're writing to let you know that we are updating the Let's Encrypt Subscriber Agreement, effective June 30, 2016. You can find the updated agreement (v1.1) as well as the current agreement (v1.0.1) in the "Let's Encrypt Subscriber Agreement" section of the following page:
Planned email blast accidentally cc'd other recipients, allowing users to see each other's email addresses. They caught it after <8,000 emails went out and are fixing the problem.
Isn't the CC header essentially an instruction for the local MTA? So their local MTA might could have been relatively slowly working through the CC list (contacting each recipient mail server in turn). I'm not saying this is what actually happened.
No, MTAs don't look at the To or Cc headers at all. The addresses to deliver to are listed in the SMTP rcpt to command. The MUA can provide a completely different list to what's in the headers.
Ok, good if pedantic response. I think the substance of my point stands: MTAs can take some time to work through the CC list (which, as you say, is passed to them with the "rcpt to" command by a well-behaved MUA)
In other words, delivering a message to copious CC: recipients is not an uninterruptable operation even after the MUA has finished its job. They might have had to/been able to stop the local MTA to interrupt the rogue emails.
Sorry, to clarify why this is not just pedantic, the MUA is likely to split up a long list of recipients rather than try to send an arbitrarily long list in one rcpt to command and potentially fail after a very long time processing and sending data. The fact that they don't have to be the same means that the list in the message can be arbitrarily long regardless of how the MUA batches it for the MTA.
In that event, it is interruptible.
Also, the MTA will batch its own outgoing sends to individual servers (giving them either just one rcpt to or all the recipients whose MX map to that server) and it could be interruptible there as well.
The point is that when sending to many many delivery addresses, things get batched at various stages and become more interruptible than if your CC was the master list of how a message were routed.
In a scenario where the entire list is in CC, this could matter. But when your scenario is leaking a small fraction of the list members, you would have to be batching CC. Once you're batching CC, then it doesn't matter if all the emails in a batch are sent out at the same time.
If you're sending to/from Gmail/Exchange the limit is somewhere around 100 addresses (I believe it's technically the byte-size of the field not strictly the number of addresses).
Right. SMTP header lines can't exceed 998 characters, but RFC2822 now allows multi-line headers[1], so there's no limit except one that might be imposed by an MTA (and that would generally only be imposed on sending, not receiving.)
I think it's positive that they own up to it and actually apologize.
One would also think that most subscribers of this newsletter has a positive attitude towards the general concepts of privacy and security, so I'm also positive in thinking that a list of these disclosed addresses will never see the day of light (hoping I'm not too naive).
I received one of these emails (most likely because my address begins with 73 and the emails are sorted alphanumerically). It looks like this: http://pastebin.com/vpPU5sLj
It probably won't be very exciting considering the emails existed only in the Body of the email. The emails themselves were only addressed to individuals. You can see this in the linked screenshot.
Good to know, but really I don't care if they send my email address to every other registrant. I run a public web server, I already receive junk email that must be filtered, so I see no problems. It has zero impact on the free certificate service they provide.
It's disappointing to see this level of incompetence from a group responsible for such great leaps in web security. Let's Encrypt should take appropriate steps to ensure this never happens again lest they erode users' trust any further.
Things like this happens all the time. Give them a break. They already did what needed to be done. It's a bad bug yes, but lots of people here could have done it.
gmail still drops the ball, because you have to give your realaddress+marker rather than being able to request a marker.
The correct behavior is to be able to request a marker when signed into any email account, and on my side set the tag that it gets tagged with in that inbox as a result. The link between marker and inbox should remain secret.
I can't see a better example of Google dropping the ball than you paying for some other service so that you can then consume it from gmail. :)
500 is a reasonable limit I think, in case spamming would be some reason for them not to do this. I don't have an opinion about whether people should be able to send from marker@gmail.com or if it should just end up in a real inbox but without the ability to send from that address.
The original thinking was to make it harder for casual snoopers. If FastMail gets compromised, then they'd need to be compromised over time and someone would need to review a lot of my email with a shelf-life of one-hour to understand who I was - I use auto-generated credentials, paid via Bitcoin, and login once a year via VPN to generate more addresses.
If Gmail gets compromised, you'd need to be looking for a bunch of Fastmail accounts in To: addresses to link my primary email with those emails.
If you wanted to track me down from an email address, you'd need a warrant in Australia (for FastMail), and the US (to find the account using those FastMail credentials), so I'd need to have actually done something wrong (which I haven't), and you'd need to convince judges in two jurisdictions of that. As I said, the threat model is against casual snoopers, rather than a determined state actor with proof of wrong-doing, as I don't think I'm even slightly interesting and I don't think I've done anything that would make me interesting.
As it turns out, you could probably just read enough of my FastMail email as it came in (before it gets deleted by Gmail) to figure out who I am, so this is imperfect.
Yes, just last week. And parts of site were down too for some hours. Is MITM something new? No. What is the point? Amazon worked fine since 1994/95. The whole HTTPS-only movement looks very orchestrated.
HTTPS everywhere is a good thing, and if you don't understand why that's fine, I don't have the time to explain to you why you are wrong, but you are. Good luck deploying Internet connected services in the past.
Directly below the apology for leaking emails addresses, I get this message prominently displayed:
> "Hey there! Looks like you're enjoying the discussion, but you're not signed up for an account.
> When you create an account, we remember exactly what you've read, so you always come right back where you left off. You also get notifications, here and via email, whenever new posts are made. And you can like posts to share the love."
Our sincerest apologies for this mistake. We will be doing a thorough postmortem to determine exactly how this happened and how we can prevent something like this from happening again.
There is a preliminary report on the issue here:
https://community.letsencrypt.org/t/email-address-disclosure...