Hacker News new | past | comments | ask | show | jobs | submit login

This seems kind of wasteful, would it be better to estimate entropy across random blocks in a file and then compress with $algo?



Not random, right? ZFS compression doesn't (at present; there have been unmerged proposals to try otherwise) care about the other records for calculating whether to try compressing something, and it would be a much more invasive change to actually change what it compresses with rather than a go/no-go decision.

As far as wasteful - not really? It might be possible to be more efficient (as I kept saying in my talk about it, I really swear this shouldn't work in some cases), but LZ4 is so much cheaper than zstd-1 is so much cheaper than the higher levels of zstd, that I tried making a worst-case dataset out of records that experimentally passed both "passes" but failed the 12.5% compression check later, and still, the amount of time spent on compressing was within noise levels compared to without the first two passes, because they're just that much cheaper.

I tried this on Raspberry Pis, I tried this on the single core single thread SPARC I keep under my couch, I tried this on a lot of things, and I couldn't find one where this made the performance worse by more than within the error bars across runs.

(And you can't use zstd-fast for this, it's a much worse second pass or first pass.)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: