Hacker News new | past | comments | ask | show | jobs | submit login

squashfs has horrible performance. All requests to the block layer are 512 Bytes. Other filesystems like ext4 make much bigger requests and perform much better in the end despite the compression of squashfs leading to lower overall data volume.

Disclaimer: Measured 2 years ago on ARM32, emmc, with a 4.1(?) kernel.




Try using the SQUASHFS_4K_DEVBLK_SIZE config option next time

By default Squashfs sets the dev block size (sb_min_blocksize) to 1K or the smallest block size supported by the block device (if larger). This, because blocks are packed together and unaligned in Squashfs, should reduce latency.

This, however, gives poor performance on MTD NAND devices where the optimal I/O size is 4K (even though the devices can support smaller block sizes).

Using a 4K device block size may also improve overall I/O performance for some file access patterns (e.g. sequential accesses of files in filesystem order) on all media.

Setting this option will force Squashfs to use a 4K device block size by default.

If unsure, say N.


I'm quite sure I have tried all options available in the kernel I used back then without achieving performance comparable to ext4. My project manager was conviced that squashfs makes things faster (and so hoped I initially because the overall data volume is smaller) so I had a hard time to convince him that we will just drop that "optimization" from the project plan. (He was one of those who can prefer checkmarks over technical merit.) I don't remember the 4K option for sure, but if it existed, we tried and measured it. What is the size ext4 is reading from the block device? I'm reading this on holidays on my phone, so I cannot easily fire up blktrace. But I could guess it's 128K or even 256K. So still far from 4K.


emmc is not a MTD device. I measured only om eMMC.


A big part of the problem might be xz decompression, that's been my discovery anyway.

https://bugzilla.redhat.com/show_bug.cgi?id=1717728


I tried various compression algorithms. I don't think there was a CPU bottleneck, at least not with the less aggressive compressions. blktrace showed the difference compared to ext4, squashfs does all reads one by one block.

It was in a previous job. I don't have access to the details anymore. And the kernel was not the newest. But squashfs looked unmaintained already then and that's what they're saying elsewhere in the discussion. So I fear nothing has changed.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: