Hacker News new | past | comments | ask | show | jobs | submit login
Migrating Messenger storage to optimize performance (facebook.com)
104 points by tagx on June 26, 2018 | hide | past | favorite | 33 comments



They say that they brought some new search features to mobile, though I would prefer if they fixed their search on desktop (or in general) first, as it only returns results from recent messages.

Until sometime in 2016/2017 it was possible to look for specific things at whatever time (e.g. 2011) and now even things from <3y ago don't return anything.


I don't see the ability to search within the app. Have they not rolled it out yet?


On iOS I see a search bar at the top of the home screen. Type in it and it searches contacts, but there's a "search in messages" item you can tap and it shows matching chats and messages.


One solution is to download an archive and grep the JSON returned


Whatever you may think of social networking, Facebook takes huge amount of pains to ensure reliability. They moved all the data from spinning disks to flash, also migrating to a new db. No downtime at all.

I'm a bit unclear as to how this process resulted in 90% decreased storage use.


"The simplified data schema directly reduced the size of data on disk. We saved additional space in MyRocks by applying Zstandard, a state-of-the-art lossless data compression algorithm developed by Facebook. We were able to reduce the replication factor from six to three, thanks to differences between the HBase and MyRocks architectures. In total, we reduced storage consumption by 90 percent without data loss, thereby making it practical to use flash storage."


Facebook has been using RocksDB more and more (now messenger, but also Instragram’s flavor of Cassandra). I wonder if Google has a similar penetration of LevelDB (and we just don’t hear about it because Google’s infrastructure work isn’t open source)


There seems to be a pattern with Google: they have internal infrastructure that every ex-Googler seem to miss when they leave the company, because it's so good that it felt like being 10 years in the future.

As far as I understand, Google doesn't have a bunch of tools that merely work together, they have one huge system with different bits that _live_ together, so much that separating and open-sourcing them is cool but won't give you the same thing as being from inside:

- They use Blaze, a build system that integrates directly with the object store. They open-sourced Bazel as a kind of equivalent, but the build system won't shine unless you have an integrated object store and an integrated vcs client - They have open-sourced Kubernetes, a successor of what they were using for they were using internally for cloud management - They have open-sourced LevelDB, a successor of the fundamental brick they are using for BigTable

So in a way LevelDB isn't used as-is inside Google, but its spirit is in use at a fundamental level by pretty much everyone


No, it does not. Generally speaking, Google "converges" a layer up (BigTable, Spanner, Colossus). There is no layer like LevelDB that is common to all of these.


During the dual-write phase, what happens if one request succeeds while the other doesn't?


Iris retries the failed request


Can't you potentially land into an inconsistent state if the failed request keeps on failing while the other db request succeeds? Is the dual-write on the synchronous request path? I'm assuming there's a timeout or some fixed number of retries that occurs and once that is exhausted what would happen?


What about giving me an easy option to export my messages while you are at it?


You can use the download your information tool to download all your data on FB, including all your messages https://www.facebook.com/help/212802592074644

A Brooklyn based artist did this recently and made a book of his conversation history (and covered an apartment with some of the pages). You can find a bit more information and pictures at https://www.halfspace.uk/messages-2011-2017


I said "easy". And by the way, the second link is dead.


Works fine for me.


Yeah, thanks. It was cloudflare not liking dillo.


Unfortunately, the feature to export your messages is only available to Facebook's data partners.


This sort of shallow dismissal does nobody on any side of any argument any good.

We've asked you many times to stop posting unsubstantive comments to HN. If you can't or won't stop, we will ban you.

https://news.ycombinator.com/newsguidelines.html


your ignorance adds nothing of value to this conversation. facebook data partners do not have a download your data button. that's not how any of this works.


based on his previous submissions, he did not come here to discuss database technology or be factually correct.


TL;DR - Moved from HBase to MyRocks engine on MySQL sitting on top of NVMe storage via their Lightning Server [1], which is a JBOF (just a bunch of flash) setup using x16 PCIe.

[1] https://code.facebook.com/posts/989638804458007/introducing-...


Anyone know why Facebook moved off HBase? (article doesn't address this)

I get that MyRocks is truly amazing, but I'm wondering what issues they were facing with HBase. I heard HBase was picked over Cassandra (developed at FB) because it was strongly consistent vs Cassandra's eventual consistency.


at the time they picked HBase because it was easier to repair missing replicas. Hdfs namenode juat schedules block replication. Cassandra (at least the version they were considering) had to work a lot harder at replacing a missing replica. Remember they had a version that didn’t have workable merkel trees.

I also heard they had an issue with network flaps causing unrepairable inconsistency. I never got the full scoop on this.

I had worked with the engineers directly responsible for the HBase choice back when. All water under the bridge now it seems.


My guess is operational experience, though the article mentions it was necessary to take advantage of Lightning. I can't think of another business on the planet that has pushed MySQL (and RDBMS' in general) further than FB.


+ app server / schema rework


Have they published a paper on this on Facebook Research?


doesn't really qualify much as research - just lots of engineering to cover all corners


[flagged]


Yes they do/have research. Facebook had 12 papers at the last computer vision conference I attended (CVPR '17) and won the awards for best paper and best student paper.


They have a very strong research arm covering a significant number of research areas. https://research.fb.com/


You are aware that Facebook is the largest website on the planet?


Great, now they can serve video ads in Messenger more easily.


This has nothing to do with video though.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: