Hacker News new | past | comments | ask | show | jobs | submit login
SandForce SSD controller beats Intel X-25M by compressing data before storing (anandtech.com)
26 points by Alex3917 on Jan 2, 2010 | hide | past | favorite | 18 comments



Whenever compression or dedupe is involved, it matters what data you're writing; unfortunately Anand doesn't specify that so it's hard to tell how realistic these numbers are.


On the last but one page, there is some discussion of this issue.


And the conclusion is that it is slower than the original version of the vertex but it is still faster in writes compared to the G2 ~140MB/s vs 109MB/s where the original vertex SSD was able to get 185MB/s.

FTA: Presumably the majority of your file writes aren’t going to be compressed files so your performance shouldn’t be gated by this issue, even then I’ve shown that you shouldn’t be any worse off than you would be with Intel’s X25-M.


Video files are compressed, and they are the biggest drive fillers.


Last but one = penultimate


Or "second to last".


To be fair, there's a good chance I'm missing something here, but 25 gb of data (and not super-redundant low-entropy ascii, a lot of executable machine code) being compressed into 11 gb seems pretty amazing. Have any third parties verified these numbers?

Editted to add: Beyond that, it seems like all this tech would make random lookups abhorrently expensive, because a seek(30) doesn't necessarily jump to the (30 * A + B)'th place. But I'm not a file systems/storage guy at all, so I'd love for some education here.


re: seeking

SSD's are block devices -- the operating system sees them as a pile of uniformly sized blocks on which it lays the filesystem. When you read() or seek(), the os always fetches the entire block that the read or seek falls in into memory and takes it from there. Any compression on the ssd will necessarily happen per-block, so seeking is not really any more difficult than on more normal systems.


it seems like all this tech would make random lookups abhorrently expensive

SSDs are virtualized (like virtual memory), so there is a mapping table from virtual to physical addresses. This is needed for wear-leveling, so you might as well also use it for thin provisioning (aka trim), dedupe, etc. In the worst case, one read command may require reading several pages of metadata from flash to find the data. Because flash is fundamentally so fast, an SSD can tolerate a lot of overhead and still be faster than a hard disk.


I just compressed the 1.6MB cgo application into 400kB using standard Mac OS X zip. A few other applications I tried compressed similarly well.


standard zip is far more cpu intensive than anything you could afford to run on something where you (a) have very limited cpu resources and (b) need to add very very little latency.


tar+gzip performed similarly well (1.6MB -> 450kB) and is a lot faster than normal zip. GZip was similarly able to compress my Mac's System folder from four gigabytes to two. Anyway, the point stands that executable files are quite compressible. If you'd like me to try it with an algorithm you consider more appropriate, please direct me to the download page.



it would be interesting to compare that to a file system with transparent encryption on an intel drive.

this is the best way i have found to think about what they are doing (it's a while since i read the article, so apologies if i am just repeating what is said, or get something wrong - iirc, anand hints at the below but doesn't state quite as much):

given an arbitrary, unreliable storage medium, you need to store both "raw" data and additional information for error recovery. it seems that until now there were technical / historical reasons that made it optimal / normal to store these separately, as i have described above (ie a disk stores a chunk of data and then has a relatively small checksum afterwards).

but there's no real reason why that need be optimal in all cases. for example, raid does something different. raid 1 stores two copies (ignore that each has error correction too, for the sake of argument). now raid has certain technical motivations (cheap disks fail, hopefully independently) that make that reasonable.

so what is new about ssds compared to spinning disks that is the enabling factor here? one guess is that since you can read from various chips at the same time you can do something like raid. for example, say you have 8 memory chips, then you could use a raid 5 style approach with one chip as parity, losing 12% of your space. or two chips losing 1/4 of the space.

my inference (and what i think anand implies at some point) is that when you sit down and do the maths, there's some number of parity chips (say) that allows you to start using cheaper chips (with higher error rates).

but that doesn't entail compression.

so either i am missing something, or the compression is optional - perhaps it is being used to hide the fact that they are having to use so much space for error correction? or perhaps it is just a marketing gimmick?

edit: or perhaps without compression it's actually too slow to sell? i think this may be it. and if so, putting compression on an intel drive will actually beat this.

or perhaps there's something about the approach that means the data have to be compressed anyway?


You're confusing two separate features. The improved error correction allows using cheaper flash. The compression/dedupe is what improves performance and endurance.


i am? i thought that is what i was saying...

what's confusing me is that this is being sold as a "high end" drive. it's not. it's a drive engineered to be as cheap as possible, using lower quality components, that's being sold as "fast" only because it has compression. put compression (eg with zfs - i don't know what other file systems have transparent compression) on an intel drive and it should be better.

but the review hints that the two are linked. if they are, then perhaps it's not as simple as that.


Even if they were aiming for low cost, I wouldn't blame them for marketing the thing as high performance given that it's currently #1.


sure, everyone knows that companies will spin things however they can (although i don't understand why some people - particularly americans - seem so fond of pointing this out; it's hardly the most positive aspect of capitalism).

but the review could have been a little more questioning. why not compare it to writing compressed data on an existing drive?

(and i'm sure i don't need to point out, to a connoisseur of free markets like yourself, that although they make a living convincing people to buy the latest product, and so work hand in glove with the manufacturers, they also have to compete for readers by reputation, which requires some level of integrity)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: