Hacker News new | past | comments | ask | show | jobs | submit login

>"Instrumental in all this work was a system that we set up very early in the project that tracked interaction between users and the system in a fine grained manner using a large number of counters."

I know you might still have some degree of an NDA pinch preventing you from giving too many details, but if possible, can you give some more info on how you went about setting up the tracking instrumentation?

As always, a fun read. Thanks!




That's tricky to answer without making this identifiable but let me try to transpose it a bit hoping that still makes sense.

If you're running a store at any one point in time the store contains the number of people that have ever entered - the number of people that have left. So by just adding two counters (person entering, person leaving) you can validate the current state of the store by subtracting the second from the first and doing a quick count of the aisles. If you have more (or fewer) people in the store than you think you should have you have either another door somewhere that you're not aware of, people are being born or dying on the premises (that might work for a hospital ;) or they're climbing out through the roof.

If the counters match there is no guarantee that that is not the case but it certainly helps to gain confidence that you know where your entrances and exits are and that people aren't keeling over while shopping in your store.

Adding a large number of checks like that will eventually give you a very quick way to test your assumptions about how things should work and to determine the impact of a change on the system. We logged all those counters on a minute-to-minute basis (1440 records per day is peanuts), and have established a number of baselines indicating what 'normal' behavior is, what 'perfect' behavior should be and this in turn (over time) gives you a goal to shoot for.

If after a change you're below normal you've probably messed something up and should roll back, if after a change you're doing better than before than good, don't change, establish a new 'normal' in a couple of days time and strive for 'perfect'.

This trick has made it fairly easy to steer the project in the right direction and saved us from making stupid mistakes a number of times (most notably: at some point we realized the sessions weren't cleaned up at all, but cleaning them up too fanatically caused some of the relationships between the counters to indicate that we had a problem, it didn't take too long before we realized that the session cleanup routine was the culprit, without having that system in place this would have taken much longer and would have done a lot more damage).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: