An easy argument to see that CRCs *can* be parallelized [1] is as follows. Note ...

cperciva · on Dec 22, 2014

CRC(first_half || second_half) = CRC(first_half || 000...0) XOR CRC(000...0 || second_half).

Yes. Or to go a bit further, taking advantage of the fact that CRCs are not just linear but also cyclic:

CRC(first_half || second_half) = CRC(first_half) * x^n mod p(x) XOR CRC(second_half), where p(x) is the generator polynomial.

The number of parts you'll want to split the data into will depend on the throughput:latency ratio of your CRC reduction (subject of course to asymptotic optimizations not being useful for small inputs, of course).

seiji · on Dec 22, 2014

but... isn't that what the article describes?

JoachimSchipper · on Dec 22, 2014

Both the article and cperciva use the algorithmic structure of CRC; but the article's trick still runs front-to-back. You need a "combine chunks" operation if you want to outsource part of the work to another processor.

seiji · on Dec 22, 2014

It exists as well, but isn't mentioned in the article. The article just cares about single process speed-up.

The other "merge CRCs" solution starts out resembling:

  /* Return the CRC-64 of two sequential blocks, where crc1 is the CRC-64 of the
   first block, crc2 is the CRC-64 of the second block, and len2 is the length
   of the second block. */
  uint64_t crc64_combine(uint64_t crc1, uint64_t crc2, uintmax_t len2)