This should never be an issue. A page load should trigger page generation code (I'm not a Rails guy, but the view in traditional MVC), not some massive 10+ second operation.
What you should be doing is caching that big time consuming job (I guarantee you aren't timing out someone's browser with template code) at the model level, and then generating pages off of that cached result. Feel free to further cache the HTML too, but the bulk of the win is from decoupling long running jobs from your display code.
If you want to ensure that the user sees the absolute latest, use an AJAX call to pull the more recent result after serving the previously cached result.
We had a similar conundrum at LayerVault a few months ago. It can be quite costly to rebuild file histories in a quick manner. We offloaded the tasks and denormalized the data a bit and now our pages scream.
I feel the need to point out that regardless of how you're "pre-warming" the cache, you're missing the primary benefit of page caching.
By default, Rails' page caching persists to disk. An immense advantage of this solution is that you can use your web server to serve these pages directly, without ever hitting the Rails stack.
You're saving the page generation time, but you're still hitting the Rails stack + Redis for every single page request, both of which are entirely unnecessary.
Page caching is only useful if your page looks the same for everybody, e.g. it has no 'Welcome $LOGGED_IN_USER' bar. Which already rules out 99% of all web apps.
Not necessarily. A possible solution for that is to render the same thing for everyone and then bring in the user specific stuff after page load via ajax. Fairly common practice.
The last big web app I worked on cached the page components. For example, the "Latest images" section is cached, but the "Logged in user (11 new mails)" section isn't. (Or as another poster suggests, use AJAX for that stuff).
You can still get away with caching large parts of your pages though.
It should never be used in production or for user-facing or critical client purposes"
script/console production < worker/cache_page.rb
?
Our slow endpoint was on a back-end administrative page only; faking the session data in curl would have been annoying. Also, it was exceeding the timeout limits of our production server
I think you have a bigger problem here.
You're right this is very hacky, it makes me itch, but I'm not sure I have better solution.
Why not just use wget to load and cache the page (passing in a unique parameter that you use to expire the cache and skip the filters)?
As far as I know Varnish will let a single request through to generate a cache and hold subsequent requests in a queue then serve everyone from the same cached data.
If your page takes long than a second or two to load when uncached you should really be moving towards a batch processing model.
I ran into a similar problem since some of my pages were taking upwards of 5 seconds to render. But rather than pre-warming the cache I opted for static generation. Here is a small library that does this for django:
It's nearly always better for nginx to serve up pre-gzipped content than for nginx to ask django/rails to ask memcached for the same content. It reduces your CPU load as well, so you need fewer servers to scale up.
On EC2, make absolutely sure you are storing the pre-rendered files in ephemeral storage (/mnt, not /var) - instance storage is slow.
We used to do this at Gilt. The problem we found was that eventually the warmer wouldn't be able to get through every page of the site before the cache expired, and users would start falling through to the rails instances. Despite there being hundreds of rails instances, this would kill the site in seconds.
We eventually moved to decoupled services, which were able to do much smarter caching.
Unless I'm misunderstanding something, script/runner could be used instead of script/console. Then the whole Rails environment is properly loaded but in a manner that's actually meant to be used for prewritten scripts instead of in an interactive environment.
What you should be doing is caching that big time consuming job (I guarantee you aren't timing out someone's browser with template code) at the model level, and then generating pages off of that cached result. Feel free to further cache the HTML too, but the bulk of the win is from decoupling long running jobs from your display code.
If you want to ensure that the user sees the absolute latest, use an AJAX call to pull the more recent result after serving the previously cached result.