Hacker News new | past | comments | ask | show | jobs | submit login
Statistics for Engineers: Applying statistical techniques to operations (2016) (acm.org)
201 points by yarapavan on Oct 3, 2019 | hide | past | favorite | 10 comments



Author here. Glad to see this on HN!

This article is quite dated. I ran the Statistics for Engineers class at various conferences over the years, and updated the material. I literally just did a session at SRECon EMEA today [1]!

The course material is here: https://github.com/HeinrichHartmann/Statistics-for-Engineers...

Todays version includes new material about:

- How averaging percentiles breaks down

- How sub-sampling affects percentile calculations

- Comparison of "mergeable aggregation methods" like HDR Histograms, t-digest, etc.

If you liked the article, make sure to check out the github course material. It's much broader and more up-to-date.

[1] https://www.usenix.org/conference/srecon19emea/presentation/...


PS: At Circonus we have built technology that is explicitly addressing many of the statistical pitfalls that are explained in the post.

If you are looking for a monitoring vendor, who deeply cares about getting the the statistics right (especially around aggregating and analysing latency data), have a look at https://circonus.com / https://lps.circonus.com/statistics-for-engineers/ and/or reach out to me.


Thanks for this! Any chance you'll be running this class in a way I can attend any time in the near future in London? (I could be convinced to fly somewhere else in Europe for a couple days though)


Most certainly yes. I am planning on proposing it again for next years conference season. In particular SRECon EMEA 2020 in Amsterdam comes to mind.

PM me your email, and I'll put you on the mailing list.


Looks like a good overview but I'm surprised it didn't cover process control charts, since they are easy to make and specifically for detecting the situation where the world has changed in a meaningful way (e.g., a machine is malfunctioning). The book "Understanding Variation: The key to managing chaos" by Donald Wheeler has a cult following and exalts control charts in the business and manufacturing world. Parts of it are a little goofy but I do think it does a good job making a simple tool useful for practitioners.

I knew someone who loved that book and taught corporate workers to throw literally all data into control charts. For instance, instead of doing a t-test, just string out the data in order of the classes and see if the points go outside the lines. I thought it was lazy, but if you're going to have one tool then I guess you could do worse than the control chart.


Interesting pointer. I had looked at the Control Theory literature. We used the CUSUM "control chart" as an internal bit in Anomaly Detection method, we built a while ago. This had limited success. Usually the data is just too noisy. Industrial sensors tend to be much better behaved than stuff tha you get from the Linux kernel.

Are you aware of people in the IT-Ops domain who use control charts?


Yes, in the cases where statical outliers can actually represent machinery failure, it's important that the 'roll ups' mentioned in the article don't hide real underlying problems.

There's also quite a few other charting techniques that financiers have been using for decades, such as ohlc/bar/candlestick or point & figure or market profile which all have their place in data visualisations. Combine that with financial charting models (ma, stochastics, etc) can go a long way in determining when things are going great/ pear shaped.

But otherwise a decent article.


Any example what you mean process control chart ? Is it in sense of process control theory ?


This is a decent description:

https://asq.org/quality-resources/control-chart

This is not process control in the sense of feedback control theory. It refers to using statistics to monitor industrial processes. When people like Deming started promoting industrial quality control, it was their intention to provide very simple tools that any factory worker could use to monitor and improve processes on their own or in small groups. So they favored graphical analysis over sophisticated statistical tests. A basic control chart was easy for anybody to produce.


+1 to this. This couldn't have come at a better time as I transition into more SRE style work and have been tasked with not only creating some dashboards but relevant ones




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: