Hacker News new | past | comments | ask | show | jobs | submit login

>and he/she came into the thread to kindly ask us to stop messing up their statistics.

Shouldn't someone with a job in statistics know how to account for outliers?




And also, wouldn't they be interested in getting those queries so they could either fix their performance or block them?


The query was already published on Quora. I assume that Google did patch the issue. They were simply requesting that it not be posted in a public forum, resulting in the potential for denial of service attacks. It wouldn't have been a statistician making a fuss about their pretty graphs being ruined. The problem was the very real performance impact the queries were having on the service, as thousands of visitors copy/pasted from Quora to see for themselves.

This is why companies like Google have bounties for such things. "Please submit bugs and performance issues privately so we can patch them before you disclose the details publicly and hurt our services - we'll even pay you for your discretion!"


This particular issue was posted on Quora, where anyone could pick it up and participate in what is essentially a denial of service attack (whether or not performed intentionally). It wasn't submitted as a private bug report to Google so they could fix the issue. It was spread in a public forum. I think it's fair for Google to politely ask "a few of your own tests to validate an issue you will submit as a bug report is fine, but please don't disclose to the public until we patch it."

When you operate at the scale of Google, everything is expected to be airtight; outliers should not be possible. It wouldn't surprise me if their monitoring systems are built without the ability to "massage" (ie: manipulate) statistics, as it is a terrible practice. I don't think a statistician who relies on ignoring outliers would last long working for Google. They're not doing their job if the only thing they care about is silencing warnings to make pretty graphs that falsely show everything is running smoothly. Their job is to work with the truth - not manufacture little white lies to appease management.


Boss: Median latency is 100ms and 99.9th percentile latency is 1 second.

Nobody ever asks about that 0.1%...


When that 0.1% - or even 0.001% - are 5-60 second requests, you have a bomb waiting to go off. There really is a massive difference when you are operating at the scale of Google. If the median is 100ms, the maximum acceptable time - 100th percentile - is likely below 200ms. A three nines percentile that is 10x the median isn't a good thing at large scale. Perfect consistency is more important than statistics. A small scale service deployed on my-little-unused-tool.com that receives a few requests/minute is an entirely different ballgame.


When it's tried often enough by enough readers of the question, it's not an outlier any more.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: