Why Reddit's been slow lately (dev group post)

dalore · on Oct 13, 2010

What if instead of looking up each user in the thread to see if they are a friend, they just serve the same page to all users. Then a bit of javascript loads up your friend list from the server and uses dynamic css to change the style of your friends username to the friend style.

That way everyone can share the cache page, and the dynamic icing is separate.

jonknee · on Oct 13, 2010

Probably better to send the friend list as JSON down the wire with the page otherwise you'll double the number of requests (not something you usually want to do when trying to scale).

enneff · on Oct 13, 2010

You could use the client-side store (supported in a bunch of common browsers) as a cache.

die_sekte · on Oct 14, 2010

Or cookies. 4 kB is a lot of friends. Invalidate them every time someone adds a friend.

Both of these however make cache invalidation (in case of multiple browsers/computers) hard—nearly impossible even.

wanderr · on Oct 14, 2010

Cookies suck for that sort of thing because they are sent over the wire with every request.

jamesaguilar · on Oct 13, 2010

Smart. Still, the db request SELECT name FROM friends WHERE source = %(source) should probably be pretty cheap given proper indexes, and could be done inline, whereas lookups for each person are likely much more expensive. You could still send the JSON in the same request by composing the cached page with the dynamic friendlist. Now all we need is for someone to make it happen.

*Pardon my broken SQL, I don't actually know the language.

jonknee · on Oct 14, 2010

What I was suggesting is do the SELECT name FROM friends query as part of the initial page load, not as a secondary AJAX request. Store the results as JSON that gets put into the page, then use JS to manipulate the comments client side. As you noted it should be cheap and as others noted, quite cacheable (friends don't change frequently).

aarongough · on Oct 14, 2010

Unfortunately if you are sending the friend list as part of the cached page then you won't be able to use that cached page for all users.

I believe the original comment was that each comments page should be cached globally and then each user can have their friend list & dynamic data fetched separately and added to the page using javascript.

joevandyk · on Oct 14, 2010

You could if using SSI (say, with nginx or varnish).

jonknee · on Oct 14, 2010

Ah, my bad then. I don't think the comments pages cache well though--the votes are always going up and down. My understanding that in addition to pulling all the comments for each request they were also pulling the friend relationships for the current user with all the participants.

gridspy · on Oct 14, 2010

Who cares if the votes are say 1 minute out of date?

Even caching a 2000 comment thread for a minute would significantly speed up that page.

jonknee · on Oct 14, 2010

I think it would be confusing to post a comment and have it disappear after a refresh. Same for votes.

jemfinch · on Oct 14, 2010

The page would be regenerated on each new comment, of course.

Votes should be done the same way friends' lists are: on the client side, via javascript. The cached page with its comments would include Javascript to grab the most current scores for all its posts (one request) and modify the DOM appropriately.

jemfinch · on Oct 14, 2010

The comments page would be cached globally; the json friends list would just be inserted as the cached page is served to the user.

Sephr · on Oct 14, 2010

Why would you use JavaScript to request the friends list JSON and then use the DOM to insert a stylesheet? Just CSS would be fine if you're changing the color of friends' usernames and not doing anything that needs JavaScript.

jasonlotito · on Oct 14, 2010

The idea is to remove the part of loading friends from the page loading. By delay the friends JS to run only after the page is loaded, the page loads faster, and can be loaded from cache. The cached page doesn't know your friends. You then have the JS load a page with friends as a separate request. This shifts the request from the loading of the page to after the page is loaded, meaning the page gets loaded faster, but missing the small feature of displaying friends.

olegkikin · on Oct 14, 2010

Keep a counter of friends for each user. Most users will have zero friends, no need to look them up.

Also, if that's such a bottleneck, create a separate webserver in C++ and keep friend relationships in RAM in some sort of an efficient data structure, something like Map [UID]->[set of friend UIDs]. Queries should be lightning fast (even if you pass a thousand UIDs in one query).

gridspy · on Oct 14, 2010

That is essentially what memcached should be doing now.

pmjordan · on Oct 13, 2010

Plus, the friend list isn't that dynamic, so even it can be cached. (but on the client side)

carbocation · on Oct 13, 2010

They could add that feature to Reddit Gold, both boosting their subscriber base and reducing their load. (But who knows; maybe there would be tremendous backlash. I'm just suggesting this because I don't personally use the feature.)

antirez · on Oct 14, 2010

Apparently Reddit folks don't like Redis too much (private email exchange), but I'm practically sure that Redis could help them so much here...

There are two strategies to mitigate reddit problems using Redis IMHO, one is simple to plug, one is advanced.

Strategy #1: Use Redis as a cache that does not need to be recomputed, instead of memcached.

To do this what they should do is things like, for all the recent "hot" news, to take everything inside Redis and update the Redis side when they write to the database.

For instance they could use a Redis hash for every news to store all the comments of a given news, indexed by comment id for easy update, so every time there is to render the comment page just a Redis call HGETALL is needed to fetch everything, like in a cache, but still with the ability to update single items easily (including vote counters if needed, using HICNRBY).

The same for firendship relations and so forth. Every place can be reimplemented using an updatable cache, starting from the slower parts.

Strategy #2: Use Redis directly as the data store, killing the need of a cache.

This needs a major redesign, but probably it can be done incrementally starting from #1, because when using Redis as a smart cache you write both the code to read and update the cache, so eventually killing the code that updates the "real" database will make Redis the only store, or it is possible to still retain the code updating the old data store just to have another copy of the whole dataset where it is easy to run complex queries for data mining and so forth, that is something an SQL database does well but Redis does not.

I think that David King evaluated Redis in contrast to Cassandra, and he did not liked the lack of cluster solution with failover, resharding and so forth (what we are tying to do with redis cluster), but I think he missed part of the point that Redis can be used in many different ways, more as a flexible tool than a pre-cooked solution, and in their case the "smart cache" is probably the best approach.

If Reddis will reconsider the issue giving Redis a chance I'm here to help.

mikey_p · on Oct 13, 2010

Here's your sign: "7s of that was waiting on memcached."

If memcache is slower than your DB, you're doing something wrong.

Confusion · on Oct 14, 2010

I don't think you've just told them something new. The problem is figuring out what 'something' is.

woadwarrior01 · on Oct 14, 2010

There usually is a point where you're pushing so much down memcache's throat that the cycles spent pickling and unpickling data in python becomes significant.

smackfu · on Oct 13, 2010

"A request that I just made on my staging instance took 13s (!) to render the front page. That's on its own cache so it should be slower than the live site, but that's still pretty ridiculous."

I'm actually seeing that kind of speeds on the live site front page.

aarongough · on Oct 14, 2010

I'm curious as to how they're structuring the data for their comment trees. Does each comment only have one parent? That parent either being another comment or the parent article? Are they using nested sets?

My preferred alternative is to have all comments have 2 separate parent fields. One that always points to the parent article and another that points to the parent comment if it has one, null otherwise.

Structuring the data this way means that you can fetch all the comments for a particular article very quickly and if you wish simply hand that raw data over to the client to be structured using JavaScript, which helps offload some of the work your server would otherwise be doing...

</armchair_development>

aarongough · on Oct 14, 2010

So I decided to get out of my armchair and have a look at the code. It's a lot to take in and I've never done any work in python but:

It looks like they're already doing part of what I proposed above. Each comment is associated directly with a 'link', and after retrieval the tree is sorted on the server-side.

Personally I don't see any reason why the tree couldn't be sorted client-side, sorting definitely seems to be one of their time-sinks, especially given that each tree has to be sorted a number of different ways (by controversy, heat, age, score, etc...), and given that the trees tend to change often (with each vote, and with each new comment)

ketralnis · on Oct 14, 2010

> I don't see any reason why the tree couldn't be sorted client-side

The sorting isn't the expensive bit, tmk

aarongough · on Oct 14, 2010

I'd be interested to know what percentage of time it takes up for rendering a comment thread in the live system though.

iampims · on Oct 14, 2010

  But when we render the Comment back to you in that same request we need
  the ID that the comment will have, but we don't know the ID until we write it out.

Wouldn't something like Snowflake help for this particular case?

  Snowflake is a network service for generating unique ID numbers
  at high scale with some simple guarantees.

http://github.com/twitter/snowflake

Kellan (from Flickr) has a neat post about Ticket servers: http://laughingmeme.org/2010/02/08/ticket-servers-distribute...

xtacy · on Oct 13, 2010

The author of the post also says that the EC2 network is "slow". Does anyone have numbers about the performance of EC2 network in general, and why this is so?

mjschultz · on Oct 13, 2010

A few months ago I read: "The Impact of Virtualization on Network Performance of Amazon EC2 Data Center" (http://www.cs.rice.edu/~eugeneng/papers/INFOCOM10-ec2.pdf) which has some performance numbers (latency and throughput) for small and medium instances.

Unfortunately, I can't recall enough of the paper right now to give you the nickle-and-dime overview, but it has graphs you can look at!

xatax · on Oct 14, 2010

As a bit of a side note, as a regular user of reddit it felt a bit odd to see this as from "David King".

It says something about the level of interaction between the reddit admins and its users that I recognize him primarily as "ketralnis".

code_duck · on Oct 14, 2010

Indeed, I had no idea who David King was, but ketralnis is familiar.

elblanco · on Oct 14, 2010

Reddit is always going through slow phases. I bet those probably coincide with growth phases and they run rather lean (or so I've heard). I'd be more worried if they started getting very fast, that might mean their user base is shrinking and they have too much infrastructure.

shuri · on Oct 14, 2010

Start with graceful degradation when things get tough? (Less comments, top friends,...).

krakensden · on Oct 14, 2010

My initial worry about that would be datastore support- can you really do that efficiently with Postgres/memcached?

shuri · on Oct 14, 2010

In memcached, caching smaller things should allow more to be cached. On the Postgres side, when the disks are hard at work any access is expensive but reading less should still help. Depending on the query you can try to get it to read less. In other situations maybe just turn stuff off. I don't know the specifics but simple things like not displaying the exact number of comments may help (counting stuff can be frustratingly expensive sometimes).