Hacker News new | past | comments | ask | show | jobs | submit login

The SINGLE biggest e-mail scaling mistake we made at AOL was to insist that the client stay in a wait state until we were Absolutely. Positively. Sure. that the message was guaranteed to have been delivered.

It ended up turning what could be a shared-nothing transaction (like every other ISP) into a network-wide two-phase commit requiring (at the time) millions of dollars of fault-tolerant hardware.

Meanwhile, guess what? You have no idea if that mail is queued at your ISP because the destination is down on a volume with a writeback cache on a RAID drive with a dead battery. You can never be 100% confident unless you have delivery status notifications, and those are pretty much dead these days.




There's a huge difference between "we have received your email and will make every effort to deliver it to the recipient" and "the recipient has received your message in their inbox". For purposes of "can I close my browser window?", the former is fine! Sounds like AOL decided on going for the latter?


Yep. It wasn't even so much a conscious initial decision as a conscious decision not to change the semantics once we got big.

At first, all mail was local, and so naturally was stored in a database, like most non-Internet mail servers of the day. It was what Dave Crocker called "rock mail" - I stick a message under a rock, and you know to look under that rock for your message.

The bigger we got, the harder that was, in the days where horizontal scaling by moving I/O to another machine was just as expensive (because the network was even slower than the disk). But we were sure that our distinction was important, and Internet-style queued mail was widely considered flaky (due in no small part to our own poor Internet delivery, no doubt). So we kept it, to the point of storing user mailboxes on Tandem NonStop machines that did multi-site replication with SQL implemented at the drive controller level.

Many of our scaling challenges were due to decisions that made sense early on and that we consciously refused to ditch; I wrote a bunch up here:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/...


Beautiful! Perhaps this should be a thread on its own.


Would it be bad form/karma whoring for me to submit it? I'm new..




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: