Hacker News new | past | comments | ask | show | jobs | submit login

I can't agree with that. More sophisticated algorithms don't necessarily have to make things worse. In fact I think the opposite is true. Simple algorithms are the ones most likely to fail catastrophically because they are made for a certain range of access patterns.

You don't always know the workload beforehand and can pick the appropriate cache policy. Workloads can even change during runtime.

I haven't seen a simple cache algorithm that performs well in every workload and I believe any that wants to succeed needs to incorporate different sub-algorithms for different workloads and be adaptive.




Have you experienced workload changes that caused an algorithm to be less effective? I am happy to learn about it. From my experience and study of traces collected in the past, I haven't seen such cases. Of course, there is diurnal changes, and sometimes short abrupt changes (e.g., failover, updating deployment), but I have not seen a case that a workload change that needs a different algorithm, e.g., switch from LRU-friendly to LFU-friendly.

Having an ideal adaptive algorithm is certainly better, and I do observe different workloads favor different algorithms, but all adaptive algorithms have parameters that are not optimal for a portion of workloads and also hard to tune (including ARC).

If you give me an algorithm, I can always find a production workload for you that the algorithm does not work well.

"Simple algorithms are the ones most likely to fail catastrophically because they are made for a certain range of access patterns" I do not see how FIFO and LRU fail catastrophically except on scanning/streaming workloads. Happy to learn about it.


The classic example of a changing workload is a database which has a normal workload that maybe resembles a zipf or bell distribution but then someone runs exports which do full scans over large parts of the DB. These changes in workloads can be intermittent but they exist. A very similar change in workload that is well known is the pagecache for filesystems (e.g. backup doing full scan over disk and evicting everything from the cache) which is why admission policies were invented in the first place. So you have two different algorithms already. Just maybe not adaptive.

A friend of mine works on a online-shop / distribution system that during most of the day has a zipf like distribution but after normal working hours when retail shops close he gets completely different patterns as the shops push their data for the day into the system and then all kinds of BI tools run for a few hours.

Of course adaptive caches are never completely optimal because it takes time for them to learn which workload is actually currently happening but after a while they can get very close. A good adaptive cache should not need tuning to get good results because the whole reason for it to be adaptive is to tune itself. Of course there are quite a few adaptive caches, especially the first ones that pioneered this, that still needed tuning.

I don't understand the last question. You say you can't see how FIFO can fail catastrophically and at the same time state a case how it can. Scanning workloads are something that does show up in the real world all the time. NovaX seems to have run your algorithm through some traces where it had a near zero hitrate. It can be more important to avoid these extreme cases than having a slightly better hitrate in the normal case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: