Hacker News new | past | comments | ask | show | jobs | submit login

From a client standpoint, POP is not actually trivial.

The main problem with POP is that unless you do something clever, determining the changes in the mailbox from time t0 to time t1 is both conceptually difficult and computationally expensive. This is because generic POP has no concept of a message UID, so there's no principled way to diff between states. In a very real sense, this means that POP is somewhat broken for the most important use case for the client: syncing to server changes.

The UIDL extension adds a "UID" but it's just an MD5 hash of the contents -- meaning that multiple copies of a message appear to be the same message. And you can ask for just the headers -- which means you can get the message-id header -- but this is still very expensive to do repeatedly (say, every 5 minutes) on a 100,000 message POP store. And you can't ask for just the Message-Id header, which would fix the problem.

Even if you have valid UIDs -- which you won't -- you still have to run a diff algorithm. Typical dynamic programming algorithms are O(N^2), which obviously sucks big time for a 100,000 message POP store.

For Inky (http://inky.com) we use a clever linear-time diff algorithm based in part on [Meyers 86]: An O(ND) Difference Algorithm and Its Variations. [Burns & Long 97] A Linear Time, Constant Space Differencing Algorithm is also a good treatment. But I know Outlook and Thunderbird both use non-linear-time algorithms to diff, so "leave messages on server" gets increasingly (non-linearly) expensive as the mailbox size grows on the server.

A few other points on comments made in this thread:

- POP is still widely used. In the US, for example, Comcast has finally migrated to IMAP, but Verizon is still POP only.

- POP is, from the server standpoint, a very simple protocol, and it is highly amenable to automated testing, as others here have pointed out. For our own testing we generate both patterned and random mailbox modification sequences, then have the test client cooperate with the test server to ensure that the client has (independently) correctly determined what's happened to the (test) mailbox. This is a perfect example of a situation where investing significant effort into automated testing pays off -- and where a TDD approach to development would also work well.




The UIDL extension adds a "UID" but it's just an MD5 hash of the contents -- meaning that multiple copies of a message appear to be the same message.

That sounds like a server problem; to quote the RFC,

The server should never reuse an unique-id in a given maildrop, for as long as the entity using the unique-id exists.

Why would you hash the contents? The arrival time should be unique, assuming no two messages could arrive at exactly the same time. That doesn't require any hashing.

I think POP could've been far better designed, without growing into the complexity of IMAP, with just a few little changes like this.


I know you know this, but RFC != what servers actually do. :)


> POP is somewhat broken for the most important use case for the client: syncing to server changes.

That's why we've had IMAP for a couple of decades now (and available anywhere for over a decade). POP simply wasn't designed for this use case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: