More

jrosoff · on Jan 12, 2011

oh man! You had a tape drive for your vic 20? when i bought games they came as a book and i had to type the assembly instructions in for 3 hours if i wanted to play!

jrosoff · on Jan 11, 2011

The look and feel of your site is great! Nice job! I agree with other commenters about the need for richer data. For example, just picking on the "top performing" site in your list (skysheet.com), here's the yottaa report on that site:

http://www.yottaa.com/url/skysheet-com-4d2ca049038ade0c05000...

Yottaa notes a similar reachability time as stella does in the "Reachability > Washington" metric. But response time performance is significantly worse at other locations.

Also, it's important to track not just response time of the server, but actual browser performance. If you look at the Page Load Time metric, you'll see that even when the server responds pretty quickly, the actual browser experience is significantly worse than a simple HTTP Client.

Overall, I like the site a lot. If you could get some more accurate metrics, it'd be great. Keep going!

BTW> Here at Yottaa we're beta-ing an API that would let you run tests and access all of our browser and low level data. Would you be interested in getting some better data from us? Contact me : jrosoff AT yottaa DOT com

jrosoff · on Oct 25, 2010

Great writeup! Couple questions:

- I'm curious why querying before a write makes such a big difference. I would have guessed that updating a document that's not in RAM would first load it into RAM, then perform the update. Does the write get applied to disk without loading the page into RAM first? If you do an update to a document that is not in RAM, is it in RAM following the update?

- Can you elaborate on the corruption that occurred to both the master & the slave during a DAS failure? We have seen something similar in our deployment (high write volume leading to corruption in both master & slave. required repair to recover. ran on a partially functioning slave during the repair), but were unable to identify the root cause.

fehguy · on Oct 25, 2010

Querying before the writes solved a lot of problems. It gets the object in the working RAM set. When doing an update, the database gets LOCKED when the statement hits the server--that means if your document is not in memory, you have to wait while it gets looked up. This was an easy, easy win for us.

Regarding the corruption, I got an "invalido BSON object" or something on repair, which tells me some object was only partially flushed to disk when the DAS went down. The slave actually worked fine for simple lookups by ID, but there was some issue with the index and I was unable to run filters against it. Luckily the huge collections are only accessed via unique identifier, so this wasn't a huge issue.

danudey · on Oct 26, 2010

This seems like the sort of optimization that should be occurring in MongoDB itself - instead of acquiring the lock, loading the record into memory (if it's not already), then making the change and releasing the lock, acquire the lock after the record has been loaded into memory (if it's not already).

Have you spoken with any of the MongoDB developers about why it's currently the way it is, vs. a more efficient update path?

fehguy · on Oct 26, 2010

I think there are some possible timing issues with making that a general behavior in the server. 10gen did make it the default behavior on slaves, where the inserts are controlled by the oplog (http://jira.mongodb.org/browse/SERVER-1646).

For us, our DB abstraction layer made this behavior so simple to add that we didn't make much fuss about it.

jrosoff · on Oct 21, 2010

Cool! There are a bunch of external monitoring services that are worth checking out in addition to this..

- http://www.yottaa.com (shameless plug for my own company) - http://www.webpagetest.org - http://www.zoompf.com - http://www.showslow.com

lazyant · on Oct 21, 2010

Hello, just a questions about yottaa.com: on my firebug Yslow(V2) I get an Overall performance score 91 (and in Page Speed I get 95). How come on yottaa I get a YSlow score of 78?

jrosoff · on Oct 21, 2010

Looks like this was a result of old data.. The score of 78 was measured on October 2nd. I hit the "Click here to re-check now" button and the score is now listed as 88. We're looking at the sub-scores now to see exactly what's different.

jrosoff · on Oct 21, 2010

What site are you measuring? I'll go check it out. Feel free to follow up via jrosoff AT yottaa DOT com

JangoSteve · on Oct 21, 2010

Be sure to post the findings/reason here.

jrosoff · on Oct 21, 2010

Yes definitely some issues with measuring page load time this way. It is useful data to get started with. The web timing API will make this data much better, but it's still early (it's in IE9 and some early developer builds of chrome and firefox). Once this API is more widely deployed, this is the right way to measure performance as it allows you to take into account navigation time as well as rendering time.

Google Webmaster Tools does indeed give you a good perspective on performance. I believe this data comes from the google toolbar (can someone confirm this?). My problem with the data reported by webmaster tools is that it's an average. It doesn't tell me what the _worst_ page load time of the _best_ page load time my users are experiencing.

As a shameless plug, you can also check out our own tools from yottaa: http://www.yottaa.com that give you a bunch of detailed information about your websites performance. I think tools like these complement the approach detailed in the post.

jrosoff · on Oct 21, 2010

Thanks for the feedback!

You're correct about the issues with "page load" time. The approach in the post is really measuring the amount of time that the browser spends processing the page. However, for many modern web apps, most of the time is spent in these internal aspects of your page such as loading CSS, running javascripts, fetching images, etc...

The web timing API (there's another post about that here: http://blog.yottaa.com/2010/10/using-web-timing-api-to-measu...) we can get more detailed timers that count not just the browser time, but actually the full amount of time between typing in the URL into the browser (or clicking a link) and finishing the load of the page.

The linear trend line was created by excel automatically so I can't vouch for its accuracy other than my implicit trust of that feature.

jrosoff · on Sept 24, 2010

Hummingbird is awesome and both as a tool and a case study. We learned a lot from Hummingbird that we incorporated into the design of our system.

jrosoff · on Sept 24, 2010

We have hit 1000's of updates per second on our current system during some high load periods and did not see any problems. Our steady state is 100's per second, but it bursts to 1000's for extended durations about once per week if not more often.

10 reports per second is actually not that much load and has almost no impact on writers. We have an alerting system that runs while data is input to the system. It effectively loads a report for each metric reported in the input and decides whether or not to send an alert. That system generates queries about 50 reports per second on an ongoing basis and does not impact the writers. Our read volume in steady state is about 2x our write volume.

We have not seen any queueing problems on writes and the lock ratio in mongodb is typically in the 0.01 - 0.005 range.

We have found that we can break this by running lots of map-reduce jobs simultaneously while processing high write volume but that's a whole other ball of wax.

Our data access patterns very easily accomodate sharding. Both reads and writes are pretty even distributed across the set of URL's we track. By activating sharding using URL as shard key, we feel we can handle scaling several orders of magnitude beyond where we are now without anything more than additional hardware (or virtual machines).

I'd love to hear how your system scaling goes. Feel free to hit me up via email if you want to discuss (jrosoff AT yottaa.com)

jrosoff · on Sept 24, 2010

Yeah slide 11 depicted what we thought would be a great solution before we started investigating MongoDB. MongoDB effectively replaced all those other systems for us and was _significantly_ easier to set up and develop against.

A few people have mentioned HBase as an alternative. We did not consider HBase at the time we were making our architecture choices, however if we were starting today, we'd probably have looked at it too. My first impressions of HBase are that it lacks the level of documentation & community support behind MongoDB. I am definitely going to dig in some more to see how it would compare. That being said, we're totally happy with our choice of MongoDB and would recommend it to anybody considering HBase.

jrosoff · on Sept 13, 2010

We do not take into account CDN's with custom CNAMEs at the moment. I responded to a comment above that addresses this. See above.