Hacker News new | past | comments | ask | show | jobs | submit login

What's the lifetime of an optane or NVMe drive when used as a constantly thrashing cache? Weeks? Months?

Edit: Missed this the first time through:

> Calculating this is done by monitoring an SSD's tolernace of "Drive Writes Per Day". If a 1TB device could survive 5 years with 2TB of writes per 24 hours it has a tolerance of 2 DWPD. Optane has a high tolerance at 30DWPD, while a high end flash drive is 3-6DWPD.




They also talk in the paper of keeping the thrashing parts of the cache in RAM. Facebook for example calculates the effective hit rate of a larger set of items and only caches those that won’t be quickly purged out / overwritten in secondary storage.


It's only four more years before the patent on adaptive replacement caching expires. Then we can use it in memcached ...


Thankfully there are better algorithms that are not encumbered. ARC is fairly memory hungry and requires a large cache to be effective. It is not as scan resistant and does not capture frequency as well as many believe. LIRS or TinyLFU based policies are what new implementations should be based on.

https://github.com/ben-manes/caffeine/wiki/Efficiency


That’s unfortunate. It’s a natural architecture you stumble into when you have a two tier cache (was doing this 10 years ago when we had a memory mapped cache and a secondary spinning disk cache for a popular website).


There is CAR which has almost the same performance and no patent. You can use that.


Why not just use LRU?


CLOCK and CAR can perform a bit better than LRU in certain situations.

Notably, CLOCK keeps items that are accessed atleast once during a round while LRU will kick out the least accessed item.

The benefit of using CLOCK is that you don't have to maintain a list but only a ringbuffer. Removing an item for a CLOCK's buffer can be almost free if you use a single bit to indicate presence. A LRU will have to maintain some form of list, array or linked. In practise, LRU is expensive to implement while CLOCK is simple. CAR offers LRU performance with less complexity.


Performance. The Wikipedia page on cache policies is pretty good.

https://en.wikipedia.org/wiki/Cache_replacement_policies#Clo...


Wiki says "Substantially" better than LRU but actual results seem to show performance converging to same levels the larger the cache gets. (See page 5.)

https://dbs.uni-leipzig.de/file/ARC.pdf

[p.s. there is also the matter of the (patterns in the) various trace runs. Does anyone know where these traces can be obtained?]


All caches have equal hit rates in the limit when the size of the cache approaches infinity. For finite caches, ARC often wins. In practical experience I've found that a weighted ARC dramatically outperformed LRU for DNS RR caching, in terms of both hit rate and raw CPU time spent per access. This is because it was easy to code an ARC cache that had lock-free access to frequently referenced items; once an item had been promoted to T2 no locks were needed for most accesses. With LRU it's necessary to have exclusive access to the entire cache in order to evict something and add something else.

Of course there are more schemes than just LRU and ARC, and one can try to employ lock-free schemes more than I'm willing to do. This is just my experience.


ARC often wins against LRU, but there is a lot of left on the table compared to other policies. That's because they do capture some frequency, but not very well imho.

You can mitigate the exclusive lock using a write-ahead log approach [1] [2]. Then you record events into ring buffers, replay in batches, and have an exclusive tryLock. This works really well in practice and lets you do much more complex policy work without much less worry about concurrency.

[1] http://highscalability.com/blog/2016/1/25/design-of-a-modern...

[2] http://web.cse.ohio-state.edu/hpcs/WWW/HTML/publications/pap...


I don't believe the table in question appraoched "infinity". Check again.


I wrote a simulator and link to the traces. One unfortunate aspect is they did not provide a real comparison with LIRS, except in a table that includes an incorrect number. It comes off a little biased since LIRS outperforms ARC in most of their own traces.

https://github.com/ben-manes/caffeine/wiki/Simulator


Thank you, you are awesome!


Serious ? Doesn't ZFS already uses it?


Sun licenses the patent and has related patents on the same thing. Sun provides an implementation under the CDDL license. ZFS on Linux is distributed under the CDDL. Linux is distributed under the GPL for which no patent holders have granted permission to use the patented inventions. Many other implementations exist including one under the Apache license and one under the Mozilla license that I found in two seconds on github.

The whole thing is a mess.


I think there's only a part of ARC that was patented so if you ran the lists differently, you could basically use the double-list idea where you move entities back and forth between them as long as the lists aren't of the same kind.


I basically used a memory-weighted variant of the ARC algorithm, sufficiently different so that the patent doesn't cover it.

I imagine there are quite a couple of variants of ARC you can use without violating the patent.


Intel has stated that the next batch of enterprise Optane SSDs will increase the endurance rating to 60 drive writes per day, which will finally put it beyond even the historical records for drives that used SLC NAND flash.


Are there current writes-per-day/durability numbers for traditional spinny disks? I can't seem to find anything other than SSD numbers.


Some hard drives come with workload ratings. For example, the WD Gold is rated for 550TB (reads+writes) per year (if you run it 24/7). But because the wearout mechanisms are so different between hard drives and SSDs, you can't make a very meaningful comparison between them.


Spinning rust doesn't really express durability in terms of number of writes. The two are orthogonal for that technology.


Hard Drives don't fail that way. Hard drives fail in other ways, so "writes per day" is simply irrelevant to the hard drive market.

Hard Drives fail because of vibration, broken motors, and things like that. MTBF is the typical metric for hard drives. There are also errors that pop up if data sits still too long (on the order of years), because the magnetic field loses its charge over time.


Much of the items that will end up on an SSD are already long lived. Every gigabyte of RAM runway before flushing items reduces the write load, since recent items are at highest risk of being overwritten/deleted.. also stuff with shorter TTL's won't be persisted (or can be persisted to bucketed pages for avoiding being compacted).

TL;DR: there's a lot to it and I'll be going into it in future posts. The full extstore docs explain in a lot of detail too.


datasheet says something along 5pb write endurance.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: