Hacker News new | past | comments | ask | show | jobs | submit login

> Not sure if we are talking about exactly the same thing

I think we are. The tweak that --rsyncable does to the compression stream reduces the effect of small changes early in the input but still only allows for forward motion through the data when processing. If an index of those reset points was kept it would act like the index you describe so you could random access in the compressed output with source block granularity and only have to decompress the blocks needed.

The original post I replied to mentioned gzip, though you mention bzip - not sure if the latter is a typo or you mean bzip2. bzip2 might make sense for that sort of data being compressed once and read many times (it is slower but with that sort of data likely to get far better compression), though it has no similar option to gzip's --rsyncable, but pbzip2's method of chunking may have the same effect.




Sorry, indeed I made typo:

http://www.htslib.org/doc/bgzip.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: