Hacker News new | past | comments | ask | show | jobs | submit login

I agree that they probably aren't doing complex user-identification...probably no cookies or JavaScript or anything like that happening on their CDN.

But, IP alone would be enough to follow the trail of most people (I know all the caveats about IP!=individual, but that data is still far from worthless), and collecting IP trails would be absolutely trivial and practically free from a performance perspective. The difference in a high performance webserver environment with logging vs. without logging is less than one percent...probably much less.

With a bit of clever data juggling, and logging of user agents and other information about the client, compare that to past visits from the same IP on properties where facebook has more data (from cookies and logged in activity), and identify users by name with pretty good accuracy.

In short: Performance is not a factor for logging, and facebook wouldn't need a large amount of additional cooperation from the client, like cookies or JavaScript bugs, to track users who aren't logged in. They just need to combine the already available log data in useful ways. A big part of the reason all these fancy distributed key/value stores and BigTable imitators exist (and why facebook has developed their own in-house) is for processing exactly this kind of data.

I'm extremely confident that facebook logs everything, though I have no idea what sorts of things they do with the resulting data.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: