Hacker News new | past | comments | ask | show | jobs | submit login

There isn't really a Good Way(TM)* to avoid file fragmentation when using compression that allows arbitrary seeks. Thought experiment: take a large easily compressed file, and write 1MB of random data in the middle. That new data won't compress as well, and so it'll have to be stored elsewhere.

Filesystem implemented deduplication is probably a much better long-term strategy than filesystem compression, but both online deduplication and online compression can impose hefty costs on frequently written data.

ZFS and BTRFS can "get away" with implementing better compression because they're copy-on-write by default. In that model, compression can actually reduce the cost of the read-modify-write, and clever use of journaling and caching can reduce the cost of fragmentation.

* - that I'm aware of, I won't pretend to know the state of the art.




These files are not often written to, and almost never randomly written to, so it's a bit of a premature optimization. Same with the text log files that I have compressed. And it negates part of the speed benefit of having less data to read.

Causing fragmentation to avoid fragmentation is a bit silly, especially when it means that none of the space saved is contiguous, leading to other files fragmenting too.


Compression not only provides less data to read but also to write. It still really does matter, depending on workload of course.


Is fragmentation really an issue on SSD? Isn't access essentially O(1)?


Indeed. SSD don't fragment. "Fragmentation" is a property of rotating devices.


Files will be still fragmented. However, you aren't affected by seek times.

The OS/file system still has to keep metadata about where stuff is stored, and many files fragmented into many tiny slices will bloat that metadata, meaning you may spend a little more time needing to read that metadata from disk (and space storing it in the first place) and spend a little more time to "re-assemble" these slices into something virtually continuous the user land expects. More metadata may also mean disk caches become full more quickly, leading to metadata being evicted more often.

This may all not matter on a beefy laptop/desktop/server, but your underpowered, memory-challenged ARM SoC NAS may notice a bit.


In addition to the answer from rndgermandude (which is also a reason why accessing a data structure in RAM, even in a world without cache lines, would be faster and more efficient if stored in a linear array rather than a tree), SSDs are themselves a complex data structure mapping data into erasure blocks in an attempt to minimize wear on the flash memory and increase efficiency. If the size of your fragments is smaller than the size of this block, you are going to take a performance penalty, and the size of this block is large: I did a search just now to find an article for you detailing this, and http://codecapsule.com/2014/02/12/coding-for-ssds-part-5-acc... mentions 32 megabytes, and anything below that leads to a second layer of data structures on the SSD designed to map the high-level blocks to low-level blocks and incurs erasure penalties.


Well, yes and no. I think the replies to you sufficiently expand upon why fragmentation is a concern for SSDs, but I want to add one more: caching.

Sequential access is faster even on random access devices because the hardware itself can cooperate to predict what-is-next, and prepare the data. Read-ahead caching is common in enterprise devices and software (see: SQL products) and CPUs, SSDs, and RAM can all participate, but only if the next locations are well-known.

There are translation layers between virtual memory mappings and RAM or storage devices, sure, but to look at two examples in detail:

1. CPUs have instructions for retrieving the next bytes, and cache lines are often 8-16 words (32-64 bytes) in modern processors. The optimization of putting data sequentially is significant enough that it's a common optimization for game developers, for which the latency of repeated cache misses can blow the CPU budget on a frame. (One source: http://gameprogrammingpatterns.com/data-locality.html)

2. For solid state disks, the SSD writes in large page-sized increments, and though I understand it's possible to read smaller chunks, if the SSD has RAM on-board (common) why not read the whole page into a LRU cache? Moreover, as SSDs themselves have a virtual mapping between LBAs and the actual data locations, why not retrieve the next LBA too? Reads aren't destructive, SSDs are implemented as essentially RAID devices over multiple flash chips for which multiple reads incurs only a small additional cost, so now sequential reads are automatically cached and the SSD improves performance over random LBA access.

An interesting note: even before SSDs became the standard, Microsoft changed the defragmentation algorithm to ignore chunks larger than 64MiB. I can't say for sure they chose an "empirical best" option, but apparently even for spinning disk, that provided a large enough runway to get most of the benefits of sequential access. For SSDs, that size is almost certainly smaller - I would guess between 512KiB and 2MiB - but still relevant for performance.


Sequential access is still faster than random access.


Why would sequential access be faster on an SSD or in RAM? I thought that there was essentially no penalty for 'seeking' in both situations?


SSDs store in large blocks, say 64k or 2m. The seek to get to such a block is constant if it's not in cache; the read time is constant if not in cache; so 'SSD seeks are free' is basically correct. However, sequential IO is likely to land on the same block, in cache, in which case the seek time is essentially 0.

In practice, on hardware I have tested on, HDD's have a 10x or more slowdown from random IO relative to sequential, while SSD's have a 3x slowdown. Your hardware is definitely not mine, take this with a Hummer-sized grain of salt.


I don't know anything about SSDs, so I won't address that. (Though, it might be the same as RAM.)

With RAM, you don't just say "Give me the 4 bytes starting at 0x10A6E780". Instead, a whole block of memory around that address is loaded into the cache (pausing execution in the meantime), and then the value is pulled from there.

If the next value you read is in that same block of memory, you don't need to spend time loading a new block of memory into the cache.


I worked with bare NAND flash on embedded system. NAND flash doesn't work like RAM where you have address pin that cover the entire space. For NAND flash, you have to send a read command with the address you want to read the data from. There are various kind of read command, one of them is auto-increment read, where you can keep reading the data and once the end of the block is reached it will load the next one.


random access just means that each access has the same amount of latency. It turns out that 100 nano seconds is quite a lot of time for a CPU. The CPU has to wait roughly 60-100 cycles doing nothing on a cache miss when it has to load stuff from RAM. If you have a predictable loading pattern like array[0], array[1] then the CPU can try to prefetch array[2] and array[3] since latency is the problem not bandwidth (unless you go multicore).


Even RAM can be fragmented. Fragmentation is a property of anything where things are added and removed. The only difference is that fragmentation on spinning rust causes increases in seek, read, and write times, but wasted storage is as much an issue even with RAM and solid state storage.


Then explain fragmentation in memory managers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: