Continuous Cache Warming for Rails

oconnore · on March 22, 2012

This should never be an issue. A page load should trigger page generation code (I'm not a Rails guy, but the view in traditional MVC), not some massive 10+ second operation.

What you should be doing is caching that big time consuming job (I guarantee you aren't timing out someone's browser with template code) at the model level, and then generating pages off of that cached result. Feel free to further cache the HTML too, but the bulk of the win is from decoupling long running jobs from your display code.

If you want to ensure that the user sees the absolute latest, use an AJAX call to pull the more recent result after serving the previously cached result.

kellysutton · on March 22, 2012

We had a similar conundrum at LayerVault a few months ago. It can be quite costly to rebuild file histories in a quick manner. We offloaded the tasks and denormalized the data a bit and now our pages scream.

briandoll · on March 22, 2012

I feel the need to point out that regardless of how you're "pre-warming" the cache, you're missing the primary benefit of page caching.

By default, Rails' page caching persists to disk. An immense advantage of this solution is that you can use your web server to serve these pages directly, without ever hitting the Rails stack.

You're saving the page generation time, but you're still hitting the Rails stack + Redis for every single page request, both of which are entirely unnecessary.

FooBarWidget · on March 22, 2012

Page caching is only useful if your page looks the same for everybody, e.g. it has no 'Welcome $LOGGED_IN_USER' bar. Which already rules out 99% of all web apps.

alinajaf · on March 22, 2012

Not necessarily. A possible solution for that is to render the same thing for everyone and then bring in the user specific stuff after page load via ajax. Fairly common practice.

huxley · on March 22, 2012

Not sure if it exists in other platforms but with Django, some people have experimented with two-phased template rendering as described here:

http://www.holovaty.com/writing/django-two-phased-rendering/

and there's a permissive license (BSD, I think) implementation:

http://codysoyland.com/projects/django-phased/documentation/...

davedx · on March 22, 2012

The last big web app I worked on cached the page components. For example, the "Latest images" section is cached, but the "Logged in user (11 new mails)" section isn't. (Or as another poster suggests, use AJAX for that stuff).

You can still get away with caching large parts of your pages though.

bhousel · on March 22, 2012

In Rails that's called "Fragment Caching", which is different from "Page Caching".

bigiain · on March 22, 2012

"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton

(though in this case, I think it's safe to assume the second hard thing is solved - "kludge" seem to fit just fine)

trustfundbaby · on March 22, 2012

It should never be used in production or for user-facing or critical client purposes"

script/console production < worker/cache_page.rb

?

Our slow endpoint was on a back-end administrative page only; faking the session data in curl would have been annoying. Also, it was exceeding the timeout limits of our production server

I think you have a bigger problem here.

You're right this is very hacky, it makes me itch, but I'm not sure I have better solution. Why not just use wget to load and cache the page (passing in a unique parameter that you use to expire the cache and skip the filters)?

foobar2k · on March 22, 2012

I don't think that's a better solution, it's similar but less integrated with the app stack.

trustfundbaby · on March 22, 2012

Good point ... how about just throwing in rufus-scheduler and using excon to make the request?

dools · on March 22, 2012

As far as I know Varnish will let a single request through to generate a cache and hold subsequent requests in a queue then serve everyone from the same cached data.

If your page takes long than a second or two to load when uncached you should really be moving towards a batch processing model.

yummyfajitas · on March 22, 2012

I ran into a similar problem since some of my pages were taking upwards of 5 seconds to render. But rather than pre-warming the cache I opted for static generation. Here is a small library that does this for django:

https://github.com/stucchio/Stiletto

It's nearly always better for nginx to serve up pre-gzipped content than for nginx to ask django/rails to ask memcached for the same content. It reduces your CPU load as well, so you need fewer servers to scale up.

On EC2, make absolutely sure you are storing the pre-rendered files in ephemeral storage (/mnt, not /var) - instance storage is slow.

mnutt · on March 22, 2012

We used to do this at Gilt. The problem we found was that eventually the warmer wouldn't be able to get through every page of the site before the cache expired, and users would start falling through to the rails instances. Despite there being hundreds of rails instances, this would kill the site in seconds.

We eventually moved to decoupled services, which were able to do much smarter caching.

sophiebits · on March 22, 2012

Unless I'm misunderstanding something, script/runner could be used instead of script/console. Then the whole Rails environment is properly loaded but in a manner that's actually meant to be used for prewritten scripts instead of in an interactive environment.

agius · on March 22, 2012

script/runner does not have the app.get facility. That's only available in script/console.

wycats · on March 22, 2012

app.get is using the same facility as integration testing (https://github.com/rails/rails/blob/master/actionpack/lib/ac...).

Check out the code that instantiates it at https://github.com/rails/rails/blob/master/railties/lib/rail...

jz · on March 22, 2012

I needed this in a rake task under Rails 2 a while back; I ended up doing the following:

  require 'action_controller/integration'
  app = ActionController::Integration::Session.new

  module CurrentUserHack
    def current_user 
      @user ||= User.look_up_my_user_here
    end
  end
  ApplicationController.send :include, CurrentUserHack

Now you can make authenticated requests in rake tasks and presumably in script/runner.

rubyrescue · on March 22, 2012

incidentally the irb trick to prevent tons of output works just fine if you do ;0

saves you a few keystrokes...