Hacker News new | past | comments | ask | show | jobs | submit login

I've sometimes heard wikipedia described as a "large scale static site plus a medium scale social network". The caching is a bit more complex than a naive static site due to churn rate and freshness requirements, but fundamentally you are right, without frontend varnish caching, wikipedia would be very different in terms of hosting requirements and scaling complexity.



I'm also wondering if the caching strategy they are using is a naive one (ie: cache is valid for a fix duration, like 5 minutes) or if it's a more active one (like stakeoverflow), with cache in validations each time a page is modified/commented on.


There is cache invalidation each time a page (or one of its dependencies. Pages depend on lots of other pages) is modified.

Assuming things havent changed, each varnish server listens for purges via multicast udp.


Purges have been migrated to kafka as a mean of transport, at long last. So now if a purging daemon crashes, purge requests are not lost.

You can see per-server stats on purges happening here:

https://grafana.wikimedia.org/d/RvscY1CZk/purged?orgId=1




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: