Windows file system compression had to be dumbed down

Dylan16807 · on Nov 1, 2016

> We live in a post-file-system-compression world.

You'd think so, but allow me to point at my steamapps folder saving 20% disk space even with the not-very-good compression that NTFS offers. If it could use a better algorithm and a block size of 2MB instead of 64KB, it could nearly double that.

Hard drives have always been growing in size. We have always been in a 'post-file-system-compression' world. But people want to do things like fit on an SSD, so compression continues to be useful.

I just wish it didn't ultra-fragment files on purpose.

AaronFriel · on Nov 1, 2016

There isn't really a Good Way(TM)* to avoid file fragmentation when using compression that allows arbitrary seeks. Thought experiment: take a large easily compressed file, and write 1MB of random data in the middle. That new data won't compress as well, and so it'll have to be stored elsewhere.

Filesystem implemented deduplication is probably a much better long-term strategy than filesystem compression, but both online deduplication and online compression can impose hefty costs on frequently written data.

ZFS and BTRFS can "get away" with implementing better compression because they're copy-on-write by default. In that model, compression can actually reduce the cost of the read-modify-write, and clever use of journaling and caching can reduce the cost of fragmentation.

* - that I'm aware of, I won't pretend to know the state of the art.

Dylan16807 · on Nov 1, 2016

These files are not often written to, and almost never randomly written to, so it's a bit of a premature optimization. Same with the text log files that I have compressed. And it negates part of the speed benefit of having less data to read.

Causing fragmentation to avoid fragmentation is a bit silly, especially when it means that none of the space saved is contiguous, leading to other files fragmenting too.

binarycrusader · on Nov 2, 2016

Compression not only provides less data to read but also to write. It still really does matter, depending on workload of course.

TylerE · on Nov 1, 2016

Is fragmentation really an issue on SSD? Isn't access essentially O(1)?

user5994461 · on Nov 1, 2016

Indeed. SSD don't fragment. "Fragmentation" is a property of rotating devices.

rndgermandude · on Nov 1, 2016

Files will be still fragmented. However, you aren't affected by seek times.

The OS/file system still has to keep metadata about where stuff is stored, and many files fragmented into many tiny slices will bloat that metadata, meaning you may spend a little more time needing to read that metadata from disk (and space storing it in the first place) and spend a little more time to "re-assemble" these slices into something virtually continuous the user land expects. More metadata may also mean disk caches become full more quickly, leading to metadata being evicted more often.

This may all not matter on a beefy laptop/desktop/server, but your underpowered, memory-challenged ARM SoC NAS may notice a bit.

saurik · on Nov 1, 2016

In addition to the answer from rndgermandude (which is also a reason why accessing a data structure in RAM, even in a world without cache lines, would be faster and more efficient if stored in a linear array rather than a tree), SSDs are themselves a complex data structure mapping data into erasure blocks in an attempt to minimize wear on the flash memory and increase efficiency. If the size of your fragments is smaller than the size of this block, you are going to take a performance penalty, and the size of this block is large: I did a search just now to find an article for you detailing this, and http://codecapsule.com/2014/02/12/coding-for-ssds-part-5-acc... mentions 32 megabytes, and anything below that leads to a second layer of data structures on the SSD designed to map the high-level blocks to low-level blocks and incurs erasure penalties.

AaronFriel · on Nov 2, 2016

Well, yes and no. I think the replies to you sufficiently expand upon why fragmentation is a concern for SSDs, but I want to add one more: caching.

Sequential access is faster even on random access devices because the hardware itself can cooperate to predict what-is-next, and prepare the data. Read-ahead caching is common in enterprise devices and software (see: SQL products) and CPUs, SSDs, and RAM can all participate, but only if the next locations are well-known.

There are translation layers between virtual memory mappings and RAM or storage devices, sure, but to look at two examples in detail:

1. CPUs have instructions for retrieving the next bytes, and cache lines are often 8-16 words (32-64 bytes) in modern processors. The optimization of putting data sequentially is significant enough that it's a common optimization for game developers, for which the latency of repeated cache misses can blow the CPU budget on a frame. (One source: http://gameprogrammingpatterns.com/data-locality.html)

2. For solid state disks, the SSD writes in large page-sized increments, and though I understand it's possible to read smaller chunks, if the SSD has RAM on-board (common) why not read the whole page into a LRU cache? Moreover, as SSDs themselves have a virtual mapping between LBAs and the actual data locations, why not retrieve the next LBA too? Reads aren't destructive, SSDs are implemented as essentially RAID devices over multiple flash chips for which multiple reads incurs only a small additional cost, so now sequential reads are automatically cached and the SSD improves performance over random LBA access.

An interesting note: even before SSDs became the standard, Microsoft changed the defragmentation algorithm to ignore chunks larger than 64MiB. I can't say for sure they chose an "empirical best" option, but apparently even for spinning disk, that provided a large enough runway to get most of the benefits of sequential access. For SSDs, that size is almost certainly smaller - I would guess between 512KiB and 2MiB - but still relevant for performance.

echlebek · on Nov 1, 2016

Sequential access is still faster than random access.

abrookewood · on Nov 1, 2016

Why would sequential access be faster on an SSD or in RAM? I thought that there was essentially no penalty for 'seeking' in both situations?

sweettea · on Nov 2, 2016

SSDs store in large blocks, say 64k or 2m. The seek to get to such a block is constant if it's not in cache; the read time is constant if not in cache; so 'SSD seeks are free' is basically correct. However, sequential IO is likely to land on the same block, in cache, in which case the seek time is essentially 0.

In practice, on hardware I have tested on, HDD's have a 10x or more slowdown from random IO relative to sequential, while SSD's have a 3x slowdown. Your hardware is definitely not mine, take this with a Hummer-sized grain of salt.

rcfox · on Nov 2, 2016

I don't know anything about SSDs, so I won't address that. (Though, it might be the same as RAM.)

With RAM, you don't just say "Give me the 4 bytes starting at 0x10A6E780". Instead, a whole block of memory around that address is loaded into the cache (pausing execution in the meantime), and then the value is pulled from there.

If the next value you read is in that same block of memory, you don't need to spend time loading a new block of memory into the cache.

pathompong · on Nov 2, 2016

I worked with bare NAND flash on embedded system. NAND flash doesn't work like RAM where you have address pin that cover the entire space. For NAND flash, you have to send a read command with the address you want to read the data from. There are various kind of read command, one of them is auto-increment read, where you can keep reading the data and once the end of the block is reached it will load the next one.

imtringued · on Nov 2, 2016

random access just means that each access has the same amount of latency. It turns out that 100 nano seconds is quite a lot of time for a CPU. The CPU has to wait roughly 60-100 cycles doing nothing on a cache miss when it has to load stuff from RAM. If you have a predictable loading pattern like array[0], array[1] then the CPU can try to prefetch array[2] and array[3] since latency is the problem not bandwidth (unless you go multicore).

talideon · on Nov 2, 2016

Even RAM can be fragmented. Fragmentation is a property of anything where things are added and removed. The only difference is that fragmentation on spinning rust causes increases in seek, read, and write times, but wasted storage is as much an issue even with RAM and solid state storage.

mfukar · on Nov 2, 2016

Then explain fragmentation in memory managers.

LeifCarrotson · on Nov 1, 2016

> my steamapps folder saving 20% disk space even with the not-very-good compression that NTFS offers

Wait, you run the Steam apps from a compressed folder? Doesn't that kill performance?

Regardless, I have long felt that the right way for Steam to make this work would be to have, in addition to the "Download" and "Delete" tools, an "Archive" button that moves it to a specified path (by default, on the install drive, but configurable to another drive or a NAS) and compresses it - with whatever compression they want.

I want my Steam apps uncompressed on my SSD. I don't have room on my SSD for hundreds of gigabytes of Skyrim textures that I haven't played in 6 months. But if I delete the app, then I have to wait hours (and cause Steam some expense) to download it again.

Not everyone has multiple drives or a NAS, but a local archive would definitely be useful.

outworlder · on Nov 1, 2016

> Wait, you run the Steam apps from a compressed folder? Doesn't that kill performance?

No. In fact, it may improve performance, by virtue of transferring less data from slower I/O devices. Yes, slower, even if it's a SSD.

TillE · on Nov 1, 2016

> Yes, slower

I'm skeptical of this actually being the case in any real-world scenarios. There have been a number of tests of running games from a RAM disk vs an SSD, with precious little difference in load times.

mitchty · on Nov 2, 2016

It depends on the data generally, but algorithms like lz4 can decompress faster than most storage mediums can keep up. Including nvme drives. Compare 1.8GiB/s reads of raw data, versus compressed almost 3 or more. This is on a skylake i7 and 2 nvmex4 drives. More cpu use but honestly, it would be stalled waiting on i/o otherwise.

The key is the data sent to the cpu and decompressed, makes up for the stall from hitting memory or i/o. Comparing ram vs ssd is the wrong comparison to make, with both you're hitting stalls due to memory. You want to compare reads of uncompressed versus compressed with the note that (and i'm just making numbers up with this analogy as i'm about to sleep), 900KiB of compressed data in, 2MiB of data out. 1.1MiB bonus and yes I'm assuming huge compression but for times your cpu is idle it makes perfect sense.

And yes, lz4 compression on things like movies still helps. I shaved off over 200GiB on my home nas with zfs.

sparky_ · on Nov 2, 2016

I would love a benchmark comparing this with and without compression! Very interesting possibility to deal with the 4TB game library problem :p

detaro · on Nov 1, 2016

You can use Steam's backup feature for that, at least for games that manage their updates through Steam: https://support.steampowered.com/kb_article.php?ref=8794-YPH...

Also useful if you don't have fast internet for your gaming PC, but access to it somewhere else.

wbkang · on Nov 1, 2016

No it does not kill performance. Decompression is a very fast operation. I have all of my drives compressed, it is fine.

btym · on Nov 2, 2016

In my case having it compressed on SSD would greatly improve performance... because my steamapps directory is on a large spinning disk due to size ;)

abrookewood · on Nov 1, 2016

These apps will move your Steam games for you and use Windows Junctions (links) to do so: http://www.traynier.com/software/steammover http://www.stefanjones.ca/steam/

danieldk · on Nov 1, 2016

We have always been in a 'post-file-system-compression' world. But people want to do things like fit on an SSD, so compression continues to be useful.

Note though that some controllers do compression to reduce write amplification (e.g. the SandForce controllers used to do this). So, by using filesystem compression, you might be increasing write amplification and shortening the SSD lifetime.

Const-me · on Nov 1, 2016

You have 2MB data.

You compress it to 1MB, SandForce controller fails to do anything because already compressed, writes it to flash, and you wear out 1MB of physical flash memory.

You don’t compress it, SandForce controller compresses it to 1MB, writes it to flash, and you wear out same 1MB of physical flash memory.

How (1) might be increasing write amplification and shortening the SSD lifetime compared to (2)? Aren’t they nearly equivalent?

rictic · on Nov 1, 2016

Is that increasing write amplification to a meaningful degree? Compressing already compressed data is generally only slightly less efficient than just compressing the data once, isn't it?

danieldk · on Nov 1, 2016

That's an interesting question. As far as I understand, yes. If you buy, say an 256GB SSD backed by roughly the same amount of storage in physical cells. Such an SSD stores 256 GB regardless whether it is compressed or not. Now suppose that we have a compressed filesystem and an uncompressed filesystem and that both the filesystem and controller compression rates are 2:1.

Now let's compare the two scenarios:

1. With filesystem compression: you could store twice the amount of data (~512GB). However, since the controller is not able to compress the data, all cells are completely full.

2. Without filesystem compression: you can store 256GB data. However, since the controller can compress the data, only half of the actual cells are used.

Suppose now that in both setups, the SSD is nearly full and we rewrite or perhaps extend some data. In the former case, the controller has to scavenge for partially-used cells to combine and erase. In the latter case, the SSD controller still has 128GB of pristine cells to write the changed data immediately.

Of course, compression was always somewhat of a cheapskate solution (and one of the reasons the SandForce controllers are not liked much). Higher-end SSDs would just have more storage space than logically addressable to have some leeway when the SSD is almost full.

izacus · on Nov 1, 2016

The fact that modern games don't even bother to compress audio and are thus 50GB in size doesn't help eiter :)

jblow · on Nov 1, 2016

It takes CPU to decompress audio, and games can easily be playing 100+ SFX at once.

rorosaurus · on Nov 1, 2016

Additional context: the game he was referring to was probably Titanfall (the first one), which had ~50GB of uncompressed audio. This included audio for every language it was localized for, which was surely unnecessary at runtime.

jdhawk · on Nov 1, 2016

Really, I thought all the basic decompression was handled on silicon by the "sound card" now, integrated or not.

khedoros1 · on Nov 1, 2016

My sound hardware seems to have the bare minimum of ADCs, DACs, and headphone detection pins to get the audio signals in and out. The drivers that do all the heavy lifting, decoding, conversion, etc on the CPU. Why do the decoding on comparatively expensive custom silicon, when you've already got a fast and flexible CPU?

avisser · on Nov 1, 2016

> Why do the decoding on comparatively expensive custom silicon, when you've already got a fast and flexible CPU?

Not a reason, but I haven't payed attention to sound cards since my Creative SoundBlaster 16. I assumed that some amount of progress/feature-creep had been made on them.

If you told me I would've believed that every sound card these days had an embedded mp3 decoder as well as some 3d audio components.

khedoros1 · on Nov 1, 2016

You can still get discrete hardware that does a lot of that: offload of decoding compressed audio, adding effects, 3D positioning calculations, hardware equalization, etc. Most people don't bother; I think that having a "real" sound card is kind of a small niche market right now.

Going back about 15 years ago: https://en.wikipedia.org/wiki/Sound_Blaster_Audigy

I've got some version of that series of cards sitting in my closet. I feel like that's about when sound cards really peaked. The hardware was mature, CPUs weren't fast enough to generate all the effects games wanted on the fly, surround sound was popular, etc. It made a lot of sense to offload that stuff to an external card.

Compare that to now: I've got some Chrome tab doing audio decoding, but my task manager reports 0-1% CPU use for each of my tabs. Audio decoding is fast enough to be done easily on the CPU. Ditto for the effects and such that are in use now. The equation shifted.

robryk · on Nov 1, 2016

Also, this is one of the cases where you can probably free up some memory bandwidth by transferring compressed samples around.

bitwize · on Nov 1, 2016

So load the clips from disk and pre-decompress them before firing the main engine up.

striking · on Nov 1, 2016

Or decompress them on level load or otherwise idle time.

eropple · on Nov 1, 2016

This increases load times. Load times are already The Devil(tm).

Disk space is cheap. Player time isn't.

michaelmrose · on Nov 1, 2016

That disk space is still worth 18 bucks on a ssd between 1/3 and 1/4 of the purchase price of the game new. This isn't an inconsiderable sum.

Also cpu time to decompress audio is microscopic it might actually be faster especially if the consumer has a hard drive to read 1/10 of the data and decompress.

Additionally there is probably a smart middle ground between keeping the data as small as possible and 50GB of raw audio.

eropple · on Nov 1, 2016

> That disk space is still worth 18 bucks on a ssd between 1/3 and 1/4 of the purchase price of the game new. This isn't an inconsiderable sum.

To me it is. I delete games when I'm done with them. The disk is reusable.

marcosdumay · on Nov 1, 2016

Does it?

Load time is normally IO bound, not CPU. If your game has the typical IO bound loading, compressing stuff will make it load faster not slower.

daemin · on Nov 2, 2016

It is IO bound, but it is also CPU and memory bound as well. I say this because most of the loading a game does is while it is running. This is especially true for large open world games (and others to varying degrees), where sections of the world (including geometry, textures, audio, music) are being loaded and unloaded all the time.

Given this is happening while a game is running, it means that the loading code has to compete for CPU and memory with all the rest that is going on in the soft realtime system that is a game.

striking · on Nov 1, 2016

The neat thing about audio is that you can stream it in. Decompressing the most important sounds right at the beginning of the load screen for a scene or before the last one ends, and then streaming in the rest, would probably work well enough. (and sandbox games have idle zones and low-CPU times, so that's when you can do that.) Players don't expect much from load times anyway.

And, no, my time as someone who plays video games is much cheaper than today's SSD storage. I wondered a few days ago why my Windows disk was so full. Video games. Literally 50% of it was video games. If I could save a couple GB on that, I'd be willing to give up a couple more seconds on every level reload.

eropple · on Nov 1, 2016

> The neat thing about audio is that you can stream it in

You can stream music. You cannot stream effects; the timing of effects is such that you can't afford a frame's delay or people will notice it's off-kilter. And, as it happens, effects are precisely what are loaded uncompressed; almost every game engine's streaming-music feature uses MP3, OGG, or whatever.

striking · on Nov 1, 2016

Often you can predict what effects your scene will need. Sometimes you can't, and in that case you can't compress.

But it's entirely possible that you could decompress some of that audio into RAM beforehand if you're not confident every device can play that audio compressed immediately. Or store back a cache file temporarily. Or just not even worry about the CPU because the vast majority of users are GPU-limited rather than CPU-limited.

izacus · on Nov 1, 2016

Sound effects are almost always streamed in - remember, most of modern games are built for consoles as well, which have significantly less memory storage. Hence lack of compression - it lets you stream in audio from a spinning BR directly without spending more CPU power.

eropple · on Nov 1, 2016

That could be the case on consoles, but I've never seen an engine that didn't strongly recommend preloading triggered effects on platforms with a RAM budget.

khedoros1 · on Nov 1, 2016

> And, no, my time as someone who plays video games is much cheaper than today's SSD storage.

Flash storage is cheaper than it's ever been, and my time means more to me than it ever has. If I could spend a few dollars to have more gigabytes of fast storage, that sounds like a bargain.

striking · on Nov 1, 2016

To upgrade my storage capacity, I can... double my SSD space. Which costs $100 (from 256GB to 512GB). That's the equivalent of upgrading from an RX 480 8GB to an nVidia GTX 1070 8GB.

I don't know about you, but I have better things to spend money on than storage that could otherwise be slightly further compressed. I'd rather wait a couple seconds longer and burn my money on upgrading the graphical fidelity of my machine.

Better yet, I could buy 3-10 games on Steam on sale for that money. Do you still think that's a worthwhile upgrade? Or would you rather wait another 3 extra seconds for every level reload in your game?

If you have more money than time, you could always disable compression. My problem is that I can't enable any sort of good compression and I'm broke.

khedoros1 · on Nov 1, 2016

We're obviously at different places in life. I'll always go for the larger storage options, for reasons of time and convenience. The cost is reasonable, and I'll get the benefits for the entire time that I have the hardware.

3 seconds extra between levels, plus 20 seconds during the initial engine load, plus 5 seconds closing down the app, etc...Time adds up. If I've got half an hour available to play a game, I want to get in+out fast. My SSD can stream data faster than my CPU can decompress it. Why would I want to handicap the hardware that I paid good money for?

On the other hand, 15 years ago, I remember shifting data around to fit on my 20GB hard drive, doing the minimum install and running games from the CD to save disk space. My priorities would've been a lot different, then.

yyhhsj0521 · on Nov 2, 2016

but downloading ~50GB of files would take hours or even days for some users.

izacus · on Nov 1, 2016

It's very common to have compressed audio on e.g. Xbox One, and uncompressed audio on PS4 and PC.

While it might make sense for consoles where you need every last ounce of CPU power and have the audio stored on 50GB Blu-Ray, on PC it's usually just a leftover from the console world.

cma · on Nov 1, 2016

Some games go a bit too far in the other direction (though it is being fixed):

https://www.reddit.com/r/skyrimmods/comments/59u0iw/the_skyr...

marcosdumay · on Nov 1, 2016

The reason we are in a "post file system compression" world is because the format of every large file people have around is already compressed. Thus, there's nothing to gain in an extra layer of compression.

But it seems game developers didn't get the memo...

greysphere · on Nov 1, 2016

They didn't get the memo, they wrote the memo: http://www.radgametools.com/oodle.htm

justin66 · on Nov 1, 2016

> But it seems game developers didn't get the memo...

Except they totally did. You'll find that most game assets are already compressed, and compressed using an algorithm that allows for reasonably quick decompression using the CPUs of the day.

dr_zoidberg · on Nov 1, 2016

I'm left thinking if the extension mechanisms in NTFS can be used to implement a better compression scheme with support from the application level (instead of filesystem level).

Or if Microsoft could sit and define a new version of NTFS in which they change LZNT to LZNT2 with better compression ratios and different requirements for modern systems :)

mikeash · on Nov 1, 2016

The justification that hard drives need to work everywhere is a little weird. Most drives stay in their host machine until they die. Requiring some recent version of Windows to read the files wouldn't be a hardship. A little checkbox that saves you X% storage space and Y% CPU time in exchange for such a requirement would be useful.

acdha · on Nov 1, 2016

> The justification that hard drives need to work everywhere is a little weird.

I think it's a pure anachronism at this point: it makes more sense to me in the era where small drive sizes meant a lot of professional users relied on external drives and “just upload it” wasn't going to fly with a 9600bps modem, not to mention hardware costing more and failing increasing the need to move drives between machines simply to get back in service.

cesarb · on Nov 1, 2016

> Most drives stay in their host machine until they die.

Or until the host machine dies, in which case they have to be readable in another machine.

(Happened to me recently: the laptop's motherboard stopped working, I used its disk with an external case and an older operating system version while I waited for the warranty replacement.)

mikeash · on Nov 1, 2016

Readable in another machine, sure, but readable in another machine running an ancient version of Windows?

nullymcnull · on Nov 1, 2016

> The justification that hard drives need to work everywhere is a little weird. Most drives stay in their host machine until they die.

The article was pretty clear that the context they had in mind for this requirement was servers in a data center, not your home machine:

> Without that requirement, a hard drive might be usable only on the system that created it, which would create a major obstacle for data centers (not to mention data recovery).

Keep in mind they still thought they'd be targeting Alpha processors as late as the Win2K RC's.

mikeash · on Nov 1, 2016

How is that a major concern for data centers? I thought drives typically stayed in a data center machine until they died too. And even if you were swapping them around, you'd only have a requirement for a certain newer OS, not the exact same physical hardware.

nullymcnull · on Nov 1, 2016

Drives ideally stay in a server machines until they die in a data center, and maybe even typically.. but MS could hardly only take into consideration what was 'typical' when it comes to things like this. And he's writing about engineering decisions made over 15 years ago here - around 1998 it certainly was more common to move hard drives around between machines. I probably did this at least 100 times just working for one company for a couple of years.

That said, it would still be one hell of a weird edge case to need to take a drive out of an x86 Win2k server, drop it into an Alpha Win2k server, and still care about its contents (vs wiping it for a newly provisioned host). But when you are writing OS filesystems, you have to care about edge cases... especially edge cases that may apply to thousands of racks worth of machines.

rodgerd · on Nov 1, 2016

> it would still be one hell of a weird edge case to need to take a drive out of an x86 Win2k server, drop it into an Alpha Win2k server

I can imagine the reverse being more common though - I worked at a shop around that time where we had a handful of very expensive Alpha NT 4 (and VMS and...) boxes, and a lot of x86 NT boxes. I could imagine the magic smoke being let out of an Alpha and having to drop the drive into an x86 box for data recovery.

mikeash · on Nov 1, 2016

My point is that the way drives are typically used means that the original decision doesn't have to be set in stone. A new format could be added, one which requires a recent version of Windows to read and which therefore can take advantage of recent hardware, and that would work just fine.

13of40 · on Nov 1, 2016

I'm curious why they haven't moved compression to the disk hardware instead of leaving it up to the operating system. Is it because you only get decent compression if you know the context of the whole file, or is it maybe that the hard drive companies wouldn't be able to market a device like that?

benjaminl · on Nov 1, 2016

Sandforce SSDs did implement compression, this allowed them to write and read less to the flash boosting performance. But the SSD still reported the full uncompressed size to the OS. This is because the abstraction that storage devices present to the OS is block and the device can't present a varying amount of total blocks on the device depending on the data written to the drive.

d33 · on Nov 1, 2016

Computing power.

Even fastest compression algorithms like LZ4 need Core i5-4300U @1.9GHz in order to get 385MB/s [1]. You'd need a pretty powerful setup to have it keep up with SSD speed and you also need to be mindful of heat generated. Also, it would be pretty useless if the volume's encrypted.

[1] https://github.com/lz4/lz4

forgettableuser · on Nov 1, 2016

Wait. Aren't the numbers for SSDs, particularly the bus speeds for the SATA connection measured in Gigabits (not bytes)?

Looking at these numbers you linked to (which seem to be in megabytes, it seems to me that decompression speeds could keep up. And I know write speeds on SSDs are a lot slower than the spec'd numbers, so the compression write speeds look plausible to me too.

The general idea is that modern CPUs tend to be instruction starved and sit idle because they are waiting on the slow memory buses that connect everything.

Am I missing something?

astrodust · on Nov 1, 2016

There was a window there where HDD performance was marginal enough that compressing data helped fetch it from disk faster since CPU decompression was quicker than waiting for the data to be fetched at full-size, especially on things like text where 10:1 compression isn't hard.

Now we're living with SSDs that can do 2GB/s and no CPU can decompress that quickly.

mschuster91 · on Nov 1, 2016

> Now we're living with SSDs that can do 2GB/s and no CPU can decompress that quickly.

I'd totally buy a machine with a state-of-the-art CPU paired with one or two FPGAs that can be programmed as accelerators for crypto, compression, etc.

astrodust · on Nov 2, 2016

Intel's working on bundling FGPA with its Xeon systems, so maybe that will happen, but it's probably better addressed with a hardware decoder like is done for H.264.

the8472 · on Nov 1, 2016

dedicated compression circuits would probably achieve higher throughput than doing it on the cpu's ALUs.

tr1ck5t3r · on Nov 1, 2016

This is just a spin document with fake constraints made to justify poor coding and/or the inability to licence real code and/or to get more money.

Do you think winzip users consider those points cited by MS when zipping files?

More like a case of MS wanted to suckle the Fed's law enforcement wallet by introducing insecurity through convenience.

jonstewart · on Nov 1, 2016

Interestingly, NTFS in Windows 10 introduces a new codec for compressed files. See http://www.swiftforensics.com/2016/10/wofcompressed-streams-...

pavlov · on Nov 1, 2016

For the algorithm that was ultimately chosen, the smallest unit of encoding in the compressed stream was the nibble

Feels to me like nobody talks of nibbles anymore, maybe because we have ample memory and storage and can usually afford to waste some bits for the convenience of byte alignment. (It's half a byte, or 4 bits.)

Disk compression is interesting because Microsoft originally included it already in MS-DOS but lost a lawsuit brought by the company behind a popular utility called Stacker: http://articles.latimes.com/1994-02-24/business/fi-26671_1_s...

That was the first time Microsoft got in hot water for bundling features into DOS/Windows (the web browser would be the straw that broke the camel's back).

IntelMiner · on Nov 1, 2016

"Software patents remain controversial; critics have contended that the U.S. Patent Office does not understand the industry and issues patents that are too broad. Patent attorneys said Wednesday's decision will encourage small software firms to use patents as leverage against the industry's big players"

Oh if only they knew what would happen

pjc50 · on Nov 1, 2016

"Patent attorneys said Wednesday's decision will encourage small software firms to use patents as leverage against the industry's big players"

The patent industry always wheels out the small inventor as PR when defending the system. As soon as noone's looking they use the same system to keep small inventors out.

billyhoffman · on Nov 1, 2016

to be clear, Stacker sued MS over intellectual property concerns, and not over monopolistic actions like bundling a web browser into their OS like they encountered with IE/Windows 98

reacweb · on Nov 2, 2016

As far as I remember, Stacker was a marvelous software that compressed disks in a transparent manner. Everybody was using it (often pirated). Then Microsoft has provided DoubleSpace for free with its new release of DOS. It was slightly inferior (more bugs) but had basically the same functionalities. This was the murder of Stacker by the infamous Microsoft.

kstrauser · on Nov 1, 2016

While I get the point he makes, and I'm certain that the team aren't idiots, I vehemently disagree. Amiga had a popular third-party library called XPK that let you install system-wide codecs on your system, and then any XPK-aware app installed could use any of those codecs to compress and decompress data on the fly. There were also patches on the filesystem so that the OS itself could detect a file's compression algorithm and decompress it transparently for apps that didn't know about XPK.

In short, in the early 90s Amiga had configurable per-file compression algorithms. There were CPU-optimized versions of almost all of those codecs, so someone using an ancient 68000 could interact with files compressed by a PPC. I could pull a drive out of my fast system and give it to a buddy with an old, slow CPU, and he could either 1) live with the reading speed penalty, 2) decompress each file one time and then use the unpacked versions, or 3) recompress each file with an algorithm more friendly to his system.

I don't think the Windows OS team is dumb by any stretch. I do think they might have been hampered by NIH syndrome, and weren't aware of (and likely couldn't care less about) how these problems were solved on other OSes.

Grishnakh · on Nov 1, 2016

>I don't think the Windows OS team is dumb by any stretch. I do think they might have been hampered by NIH syndrome

I think NIH qualifies as a form of stupidity.

pyreal · on Nov 1, 2016

Interesting timing. I just used Windows' built in compression to free up 25 GB on a customer's full C: drive. First time I've used the feature in years. I noted that it didn't seem to be very good compression level.

rexicus · on Nov 1, 2016

It's never been worth using due to the slowdown and fragmentation it causes.

tonyarkles · on Nov 1, 2016

An interesting angle that I read a paper on in grad school a few years ago but don't have handy: because of the asymmetric growth of CPU speed vs. hard drive speed, you can actually get performance gains by enabling filesystem compression. It seems counter-intuitive, but it boils down to "can I compress this data faster than the disk can write it?"

If you're blasting highly compressible data to disk, compressing it on the fly can, in some circumstances, have a net bandwidth greater than just writing the data to disk (and greater than the disk alone is capable of). Yes, you incur more CPU load, but it's a net win. It's not universally true, YMMV etc.

LeifCarrotson · on Nov 1, 2016

One counter to this argument is that, while the CPU has thousands of cycles to do compression while an old 5400 RPM spinning rust drive seeks or writes, a modern SSD like the new Samsung 960 Pro can write data at 2100 MB/s, giving the CPU perhaps one cycle per byte, which makes compression difficult.

The counter to my counter, of course, is that specialized silicon for compression can easily keep up with even these speeds. In fact, Sandforce SSD controllers build in compression to boost read and write speeds!

tonyarkles · on Nov 1, 2016

Thanks for the reminder that while grad school doesn't feel that long ago, spinning rust was definitely the name of the game at the time. SSDs were a thing, but way too expensive and small for non-exotic purposes... :D

miahi · on Nov 1, 2016

Sandforce does compression for minimizing writes, not for speed. They are slower than non-compressing controllers for some tasks because of that, but it should increase the device's lifetime.

bunderbunder · on Nov 1, 2016

I can say that's true from empirical experience - doubly so if you're targeting an environment where the data's likely to be stored on a NAS or some other fairly slow storage.

(And yeah, natch, insert recitation of the Liturgy of the Optimizer as a charm against the appearance of a certain Knuth quote.)

anonymfus · on Nov 1, 2016

Slowdown in one set of tasks and speed-up in another. For example, boot time is usually shorter when system folders are compressed (with compact /CompactOS:always).

Alupis · on Nov 1, 2016

> For example, boot time is usually shorter when system folders are compressed

Reads, in general, will be "boosted" when the filesystem is compressed.

You get an (almost) free boost by reading and extracting compressed data on the fly into memory/cpu. ie. read 4mb's off the disk but it expands to 8mb's (or more!) in memory, so read performance is elevated.

Writing of course, is slowed.

For some server loads, disk compression still makes a lot of sense - making the claim "We live in a post-file-system-compression world" a little dubious.

foota · on Nov 1, 2016

Why is writing slowed?

Alupis · on Nov 1, 2016

Because it must be compressed first, a cpu intensive task, which slows the already slow disk I/O.

foota · on Nov 1, 2016

I think a cpu can compress data (provided it's in memory) orders of magnitude faster than it can write to disk. Now that I think about it though, the context would be necessary to compress well, so maybe you would need to read surrounding data.

Alupis · on Nov 1, 2016

Compress any larger piece of data on your own computer, and watch the CPU bottleneck.

Even on the fastest CPU's, writing a file that must be compressed first, will always be slower than the same file on the same hardware, but not being compressed before written to disk.

It may not be magnitudes slower - depending on the data, hardware, and algorithm, of course - but it will incur some write penalty. So, with disk compression, you get a write penalty and a read boost.

cmurphycode · on Nov 1, 2016

You are correct that most compression algorithms are slower than their corresponding decompression, which means the writes tend to be a different tradeoff than the reads.

I do not agree with your statement that compress + write is always slower, though. The analysis for cost/benefit of compression is the exact same logic for both reads and writes, but with different formulas based on how fast you can compress/decompress and read/write blocks. Let's imagine that your disk takes 10ms to write 64KB and 20ms to write 256KB. Then if compression of a 256KB block down to 64KB takes 5ms, writing is faster with compression done first. On the other hand, if compressing it takes 50ms, then writing it raw is quicker.

ThrustVectoring · on Nov 1, 2016

Whether it's a penalty or boost for reading or writing is entirely dependent on the performance characteristics of reading, writing, compressing, and decompressing. I can think of a reducto ad absurdum that's a write boost and a read penalty - I/O is carving and reading text from a stone tablet, compression is looking up shorthand from a dictionary.

Someone1234 · on Nov 1, 2016

I don't understand how that is technologically possible.

Most of Windows 10's boot, when Fast Startup is enabled, doesn't involve the typical system folders at all. Effectively when you shut down, they collapse the userspace, and store the kernel/services/etc into the hibernation file. When you "boot" the computer, the hibernation file is re-mapped into memory sequentially, and the user is prompted to login.

Compacting the OS definitely saves space. But I don't understand how it would reduce boot times from a technical perspective, since IO isn't even reading those folders during a default boot on Windows 10.

Alupis · on Nov 1, 2016

You're not always "fast booting" windows 10. When you restart/reboot, for example, it must read all those files from the disk.

You'll get a performance boost by reading compressed data off the (slow) disk and expanding it in (fast) memory.

This can effectively increase read speeds greatly (reading a 2mb file off disk but it expands to 3mb's for example, that 3rd mb came almost "for free").

anonymfus · on Nov 1, 2016

>Most of Windows 10's boot, when Fast Startup is enabled[...]

You are completely right for this case.

Someone1234 · on Nov 1, 2016

This case is the default for Windows 10.

blahyawnblah · on Nov 1, 2016

Any version of Windows after I think XP kind of does defragging automatically. At least to the point where the user is told it isn't necessary to do manually.

rplst8 · on Nov 1, 2016

I think this was a feature of NTFS over FAT32 - not an OS level feature. XP was the first home oriented OS to support NTFS natively.

brassic · on Nov 1, 2016

When NTFS was first released Microsoft claimed it was so awesome it didn't need defragging. In later releases of Windows NT they added a defragger. By around the time of XP this was automatically scheduled to run in the background.

tldr: NTFS does a better job of avoiding fragmentation than FAT, but both need defragging.

lucb1e · on Nov 1, 2016

This article makes a lot of assumptions and has some weird thoughts.

> Well, okay, you can compress differently depending on the system, but every system has to be able to decompress every compression algorithm.

But that's not how this works. You don't design an entirely new algorithm for every possible performance, you create one or two algorithms and tweak them. Then make one or two decompressors that can handle any compression setting. Simple example: lz77 (used in deflate/gzip) whose decompressor is extremely fast regardless of your compression settings.

> Now, Windows dropped support for the Alpha AXP quite a long time ago

Why does this sound like "... and they changed nothing"?

> you can buy 5TB hard drive from [brand] for just $120.

Sure but we also have more data to store. If you wanted 100 games in the 80s you'd need what, 1MB storage? I'm just guessing, I'm too young to know that. Now that'd be what, 1TB? It might be cheaper but I'm just saying, storage prices going down does not change that compression is a good idea.

> many (most?) popular file formats are already compressed

Before they were all about data accessibility and recovery on different/damaged systems but compression doesn't help that. Now this is a good thing? Additionally, binaries and libraries are not compressed; many of my documents are just text files; and databases still benefit from this a lot. (But who has a database on their computer? Anyone who uses a web browser and email program.)

> We live in a post-file-system-compression world.

I still think it's a fine idea.

> Tags: History

Lol

olavgg · on Nov 1, 2016

I wonder how effective LZ4 would be as default compression algorithm for NTFS. It is extremely effective with ZFS.

zdw · on Nov 1, 2016

Port it to Alpha and let's see. I'll loan you my 21164 over the weekend.

dfox · on Nov 1, 2016

The article mentions that alpha was weak on bit-twiddling, which is something that almost certainly got fixed by introduction of BWX instructions on 21164.

EvanAnderson · on Nov 1, 2016

Thanks for mentioning that. You sent me down a brief rabbit-hole of nostalgia looking for the EV4/EV5 differences. If anybody else wants to go down that rabbit-hole here's where I ended-up: http://alasir.com/articles/alpha_history/alpha_21164_21164pc...

dchest · on Nov 1, 2016

Interestingly, LZ4 works mostly on nibble and byte level without much bit-twiddling.

krylon · on Nov 1, 2016

Also, when was the last Alpha built? About ten years ago?

It's fine for OpenBSD or NetBSD to support it, but Microsoft has not supported it in a long time.

honkhonkpants · on Nov 1, 2016

Was axp even the weakest of the four windows architectures for this task? PPC and MIPS can leave the bit twiddler scratching their heads, too.

dfox · on Nov 1, 2016

Problem with original Alpha is that accessing memory as anything other than aligned 32b or 64b words (yes, including bytes) counts as bit-twiddling because only load/store instructions that it has operate on these two word sizes.

I can't remember any other non-niche architecture with 8*2^n word size that shares this (mis-)feature. (I suspect that Cray 1 and it's derivates also share this, but that probably counts as niche architecture)

toast0 · on Nov 1, 2016

Since the last released build of Windows for Alpha was Windows 2000 RC1 (although it looks like there's an RC2 build floating around), I don't think we need to support that processor for a new NTFS extension.

yyhhsj0521 · on Nov 2, 2016

Why can't we just use different compression levels? Or it wouldn't be hard to build into Windows multiple compress-algorithms. So that fast machines use high compression level or CPU-demanding algorithms, and slow machines use the contrary. Therefore slow machines could still decompress files from faster machines efficiently, because in decompression HDD I/O is the bottleneck [1].

[1] http://superuser.com/questions/135594/what-is-more-important...

wmf · on Nov 2, 2016

When this stuff was being designed in the 1990s the hard disk wasn't the bottleneck.

rbanffy · on Nov 2, 2016

I think there would be a strong case for the decompression logic being able to read whatever you throw at it while the compression logic would need to decide how hard it should think to meet the performance requirements depending on what the present machine can/should do considering its speed and/or present load.

Keeping the decompressor robust would easily solve the drive portability concerns, making the data readable (even if not at maximum speed) across machines and architectures.

my123 · on Nov 1, 2016

Deduplication works well these days.

revelation · on Nov 1, 2016

But of course all of that old code is still there and maintained. We live in a post-file-system-compression world with ubiquitous file-system-compression.

hobarrera · on Nov 1, 2016

> However, we also live in a world where you can buy 5TB hard drive from Newegg for just $120

It's a lot more expensive to install that disk it into an ultrabook/mba.

jxy · on Nov 1, 2016

In other words, if you have control over both software and hardware, you can deliver a much better user experience.

rasz_pl · on Nov 1, 2016

>We live in a post-file-system-compression world.

says a guy working for a company shipping system with >6GB (often 10GB in 100K individual small files!) of redundant NEVER EVER touched data inside WinSxS. Data that cant be moved off the main drive without serious hacks (hardlinking). This fits in the general pattern of indifference to user hardware. Another one (my fav) is non movable hiberfil.sys that MUST be on primary drive, cant even move it with hardlinks. There goes 16GB of your SSD for a file that gets used ONCE per day.

This is what happens when you hire straight out of college programmers and put them on top of the line workstations.

saulr · on Nov 1, 2016

If you disable hibernation then the hiberfil will disappear. With low power states in modern computers, sleep is suffice (hibernate is a relic from old Windows):

> powercfg -h off

leeter · on Nov 1, 2016

tbf... almost all of WinSXS is hardlinks, there is very little actual duplication. It's just that explorer doesn't understand that and reports it as duplication.

rasz_pl · on Nov 1, 2016

easily disproved:

C:\WINDOWS\system32>dism /Online /Cleanup-Image /AnalyzeComponentStore

Windows Explorer Reported Size of Component Store : 8.18 GB

Actual Size of Component Store : 7.78 GB

acqq · on Nov 1, 2016

The next line of the output from the command you specified but which you didn't quote and that gives almost 5 GB on the machine in the example it is named "Shared with Windows"

https://technet.microsoft.com/en-us/library/dn251566.aspx

"This value provides the size of files that are hard linked so that they appear both in the component store and in other locations (for the normal operation of Windows). This is included in the actual size, but shouldn’t be considered part of the component store overhead."

rasz_pl · on Nov 2, 2016

1 you specifically said "WinSXS is hardlinks", this is not true

2 still leaves up to 5GB of redundant garbage in WinSXS, things like multiple versions of random DLLs nobody ever uses, 360MB for ~11 versions of 'getting started' package consisting of same movie files, color calibration data for obscure scanners on bare minimal install etc

Turning drive compression omits this directory while happily compressing files in /system.

acqq · on Nov 2, 2016

> you specifically said

No, you communicated before with somebody else.

> still leaves up to 5GB of redundant garbage

No, if you read that example, there is less than 1 GB which is stale (the diff between the first and the third number):

>> Windows Explorer Reported Size of Component Store : 4.98 GB

>> Actual Size of Component Store : 4.88 GB

>> Shared with Windows : 4.38 GB

As I've said, you intentionally didn't quote your machine's third line but I'm sure it's not a 5 GB difference.

Moreover, that difference can be purged if necessary. Which you can also do on your machine with just a single command, per link on the same page:

https://technet.microsoft.com/en-us/library/dn251565.aspx

There is a trade-off, the probable case why nobody does it unless really necessary:

"All existing service packs and updates cannot be uninstalled after this command is completed"

yuhong · on Nov 1, 2016

To be more precise, CBS is to blame for this.

CharlesMerriam2 · on Nov 1, 2016

So many comments being harsh to Microsoft for bad engineering by ignoring the real world.

It's Microsoft. Would you critical of a pigeon for defecating in flight?