Hacker News new | past | comments | ask | show | jobs | submit login
Compression algorithms used by Google (feedblog.org)
67 points by helwr on May 17, 2010 | hide | past | favorite | 6 comments



In-memory compression is another technique not to be forgotten. If you can efficiently compress your in-memory representations of data you would otherwise have to hit disk or network for, you can dramatically increase the amount of data you can cache in memory, especially if you're aware of the structure and general patterns of the data.

Once I had to cache file paths, millions of them. I got a fairly easy 6x improvement by storing them in a Patricia trie, with each node's string stored as a range of UTF-8 encoded bytes, and indexed by arrays of bitfields - i.e. bit-packed arrays. Child nodes were similarly indexed using bitfield arrays. Turning what would be a 32-bit (or worse, 64-bit) pointer into (e.g.) a 28-bit integer really starts to add up when you have millions of them.


db2 and oracle both implement similar compression algorithms. There's nothing terribly special about google's compression algos.


That they are used by Google makes them somewhat special, in that one would assume they have put some thought and experimentation into choosing them. Also, that would make for pretty good field-testing of them.


Any company building enterprise level products puts thought and experimentation into choosing their algorithms. Google just advertises their research publicly as part of their branding.


A few months ago I extracted bmdiff from hypertable to use standalone. Give it a try: http://bitbucket.org/mattsta/bmz/src


Thanks for extracting bmz into a standalone. I actually went searching for it after reading the article's comments last night and came across your link. One question I have is how up-to-date is this version of bmz? It says that it's from a Hypertable commit dated Jan 25, 2009. Would I be able to find a newer version in Hypertable's source?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: