Let's face it - it's not the language that is the problem. It would help you, bu...

pufuwozu · on July 30, 2010

Isn't Pigz a parallel implementation of gzip which is itself based upon LZ77?

http://www.zlib.net/pigz/

Hasn't there been a few parallel implementations of bzip2?

http://bzip2smp.sourceforge.net/

I understand that parallelism isn't suited for a lot of algorithms but from what I can tell, there's been heaps of successful work in compression. Could you give us some more information?

It would really help me out - I'm doing an undergraduate class where I have to manually make an open-source project parallel. I've been looking into compression algorithms because I thought they were well suited. Please help me out if I'm going down the wrong path!

wmf · on July 30, 2010

The key to malkia's strawman is that pigz does not give the same results as the sequential "c" version of gzip. AFAIK gzip is not parallelizable as-is because every symbol depends on the previous symbol; pigz breaks this dependency to get parallelism but gives up a little compression efficiency. In practice this does not matter, which is why LZ is not a good example.

astrange · on July 31, 2010

Compression algorithms are poorly suited to parallelism, because they remove everything that isn't a data dependency in the input, and parallelism is nothing but a lack of data dependencies.

The trick is to start at the largest chunk possible and go down until you find where they have left in some, uh, non-dependencies - like bzip2 which has independent x*100KB blocks, and video which (usually) has independent frames. You should be able to get 2-4 separate tasks out of that, which is good enough for CPUs.

kanak · on July 31, 2010

> Let's face it - it's not the language that is the problem. It would help you, but it would help you only with some percentage of the problems out there.

Let's not underestimate the help that a language can provide; writing interpreters for languages is almost trivial on a lisp because you can reuse pretty much every piece of machinery on a lisp for your ends. Similarly, there is an entire class of problems that is nearly trivial on prolog that is pretty difficult to get right on other languages simply because prolog makes it easy to express rules and specifications that need to be met. Just look at an implementation of a sudoku solver in prolog and compare it with some other language.

I feel that a language designed with concurrency in mind would make it much simpler to write an entire class of problems. These languages are just gaining traction, so we are yet to see bigger and more significant examples. However, the "ants.clj" demo that Rich Hickey has written in Clojure, and some of the erlang demos in Joe Armstrong's book have made me a believer.

wmf · on July 30, 2010

certain algorithms, hence protocols, data structures, standards are not suited for parallel processing that well.

Sure, but for problems that can be parallelized you want a language to be able to express that parallelism. That may sound obvious, but many popular languages cannot do it. Let's not just give up on parallelism because it can't be used everywhere.