They say that they brought some new search features to mobile, though I would prefer if they fixed their search on desktop (or in general) first, as it only returns results from recent messages.
Until sometime in 2016/2017 it was possible to look for specific things at whatever time (e.g. 2011) and now even things from <3y ago don't return anything.
On iOS I see a search bar at the top of the home screen. Type in it and it searches contacts, but there's a "search in messages" item you can tap and it shows matching chats and messages.
Whatever you may think of social networking, Facebook takes huge amount of pains to ensure reliability. They moved all the data from spinning disks to flash, also migrating to a new db. No downtime at all.
I'm a bit unclear as to how this process resulted in 90% decreased storage use.
"The simplified data schema directly reduced the size of data on disk. We saved additional space in MyRocks by applying Zstandard, a state-of-the-art lossless data compression algorithm developed by Facebook. We were able to reduce the replication factor from six to three, thanks to differences between the HBase and MyRocks architectures. In total, we reduced storage consumption by 90 percent without data loss, thereby making it practical to use flash storage."
Facebook has been using RocksDB more and more (now messenger, but also Instragram’s flavor of Cassandra). I wonder if Google has a similar penetration of LevelDB (and we just don’t hear about it because Google’s infrastructure work isn’t open source)
There seems to be a pattern with Google: they have internal infrastructure that every ex-Googler seem to miss when they leave the company, because it's so good that it felt like being 10 years in the future.
As far as I understand, Google doesn't have a bunch of tools that merely work together, they have one huge system with different bits that _live_ together, so much that separating and open-sourcing them is cool but won't give you the same thing as being from inside:
- They use Blaze, a build system that integrates directly with the object store. They open-sourced Bazel as a kind of equivalent, but the build system won't shine unless you have an integrated object store and an integrated vcs client
- They have open-sourced Kubernetes, a successor of what they were using for they were using internally for cloud management
- They have open-sourced LevelDB, a successor of the fundamental brick they are using for BigTable
So in a way LevelDB isn't used as-is inside Google, but its spirit is in use at a fundamental level by pretty much everyone
No, it does not. Generally speaking, Google "converges" a layer up (BigTable, Spanner, Colossus). There is no layer like LevelDB that is common to all of these.
Can't you potentially land into an inconsistent state if the failed request keeps on failing while the other db request succeeds? Is the dual-write on the synchronous request path? I'm assuming there's a timeout or some fixed number of retries that occurs and once that is exhausted what would happen?
A Brooklyn based artist did this recently and made a book of his conversation history (and covered an apartment with some of the pages). You can find a bit more information and pictures at https://www.halfspace.uk/messages-2011-2017
your ignorance adds nothing of value to this conversation. facebook data partners do not have a download your data button. that's not how any of this works.
TL;DR - Moved from HBase to MyRocks engine on MySQL sitting on top of NVMe storage via their Lightning Server [1], which is a JBOF (just a bunch of flash) setup using x16 PCIe.
Anyone know why Facebook moved off HBase? (article doesn't address this)
I get that MyRocks is truly amazing, but I'm wondering what issues they were facing with HBase. I heard HBase was picked over Cassandra (developed at FB) because it was strongly consistent vs Cassandra's eventual consistency.
at the time they picked HBase because it was easier to repair missing replicas. Hdfs namenode juat schedules block replication. Cassandra (at least the version they were considering) had to work a lot harder at replacing a missing replica. Remember they had a version that didn’t have workable merkel trees.
I also heard they had an issue with network flaps causing unrepairable inconsistency. I never got the full scoop on this.
I had worked with the engineers directly responsible for the HBase choice back when. All water under the bridge now it seems.
My guess is operational experience, though the article mentions it was necessary to take advantage of Lightning. I can't think of another business on the planet that has pushed MySQL (and RDBMS' in general) further than FB.
Yes they do/have research. Facebook had 12 papers at the last computer vision conference I attended (CVPR '17) and won the awards for best paper and best student paper.
Until sometime in 2016/2017 it was possible to look for specific things at whatever time (e.g. 2011) and now even things from <3y ago don't return anything.