Statistical Process Control: A Practitioner's Guide (2022)

roenxi · 2024-03-08T01:03:31 1709859811

A few related observations:

Software development is not a stable process. Either a team is always building new things in which case there isn't a consistent process to measure, or there is a see-saw as people release new features, deal with the bugs in the features, then go back to building - that isn't a controlled process, it is going to oscillate in statistically weird ways.

If SPC is applied to bugs, it will be monitoring the relevant manager's habits. That is to say, if you show me a nice in-control timeseries of bug resolution, all that says is when a bug blows out horribly the manager splits it into 2x tickets or something similar. It isn't necessarily a bad outcome (small tickets are happy tickets and gently stressing managers is a good idea) - but don't expect the devs to behave differently.

It is good to have a grounding in SPC, just don't try to apply it to every timeseries that you see. Bugs are a timeseries, but they aren't expected to be a controlled process so SPC's assumptions break down and the logic doesn't work. If it does work, it is probably measuring something other than the software development aspect of the process.

kqr · 2024-03-08T06:50:14 1709880614

Things in software development that have been stable in my experience:

- Weekly deploy count

- Weekly growth in lines of code

- Fraction of candidates hired

- Weekly number of bug fixes

- Weekly number of problems with a third party collaborator

- Many internal metrics generated by the software, reflecting usage etc.

- Weekly number of consultant hours required

- Monthly growth of feature flag count

- Length of standup

- Time required to complete "small" tasks (i.e. those that don't involve novelty)

- Length of successful build in CI

- Proportion of CI builds that fail at least one test

- Growth of number of tasks in backlog

I could go on. The point is that while much of the value of product development comes from novelty and variation, there are many parts of the process that remain the same from week to week.

Figuring out what these parts are and getting variation out of them allows the developer to

(a) focus creativity on systematically solving process problems instead of doing it ad hoc, and

(b) let the process recede into the background and focus creativity on creating end-user value.

roenxi · 2024-03-08T07:38:17 1709883497

Your heartbeat and breaths/week are also quite consistent. The issue here is that you have to have a reasonable theory of why variance in the metrics you're tracking will destroy value. And that means actual value, not whinging that standup goes too long. I like a short standup as much as anyone, but if that is a material driver of value destruction then your organisation is not ready for statistical quality control. Plus it probably goes over time consistently.

And once you start looking at the value, the lesson of software is that high variance activity is often the value add. It is the week(s) where someone implements pivot tables in Excel that creates probably billions of dollars in value over all of humanity. If that turns up as a statistical anomaly in the metrics because their line manager didn't bug them to fix bugs that week, that is a problem with the metrics not the programmer.

This isn't a Toyota production line (if you're interested in the history of this, that is no random example) where value is uniformly created with each car and optimising the daily process down to degree n creates value. This is software. The value isn't created in the same way and these tools are not powerful in driving value add decisions. Variance is untidy but by no means an enemy. It must be managed case by case in context.

mjfisher · 2024-03-08T08:44:22 1709887462

There's a difference between input and output metrics to be considered too. Attempting to manage the output metrics directly rather than addressing the underlying causes is almost always the wrong thing to do.

kqr · 2024-03-08T08:13:17 1709885597

> The issue here is that you have to have a reasonable theory of why variance in the metrics you're tracking will destroy value.

Establishing this theory requires a stable process! Without a stable process, you cannot make deliberate, systematic changes and observe how it affects outcomes. That sort of observation is key to theory-building.

I agree with most of what you say about the value in product development coming from innovation which is literally unaccounted-for variance. I just don't think that innovation happens in the length of standup meetings, which I think is better controlled statistically.

roenxi · 2024-03-08T09:02:53 1709888573

> Establishing this theory requires a stable process! Without a stable process, you cannot make deliberate, systematic changes and observe how it affects outcomes.

shrug Welcome to software engineering. Enjoy your stay.

If anyone has figured out how to make deliberate systemic changes that add value, they really need to publicise it because I'm not aware of many approaches that aren't absurdly basic (things like get release and get fast feedback).

There are lots of examples of companies like Google, Microsoft, AWS, etc that generate huge amounts of value from early on in their life cycle them then coast on that while experiments with processes never really move the needle all that much. Google has been search and ads for 20 years and none of their experiments with software quality since then have been all that impressive. It isn't even all that clear from a customer perspective that the quality and quantity of software is improving. If anything, they need to slow the programmers down, write less code and dedicate more resources to maintaining platforms and pushing them to succeed. SPC won't help with that though, because measuring "nothing happening" isn't what SPC is targeted at.

In fact, open source projects where there is usually no QAQC in sight are vast wealth and productivity fountains, we have people like Hipp over at SQLite who just tests everything to within an inch of its life. Good luck replicating that level of productivity with SPC. No systemic process has come close to matching the productivity one madman with a love of databases and stability. We have no idea how to make that repeatable; because the fundamental process that creates value isn't stable.

lazyasciiart · 2024-03-08T11:39:07 1709897947

> I'm not aware of many approaches that aren't absurdly basic (things like get release and get fast feedback).

A bit much to call that “absurdly basic” when release cadence still varies massively even among successful companies. Patch Tuesday? Why not Patch Every Day!?

jacques_chester · 2024-03-08T01:10:03 1709860203

> Software development is not a stable process.

This causes confusion in my experience. "Stable" doesn't mean what it does in casual usage. It means that you have a mean and variance that are not varying over time (and there are ways to alert if they are). A stable process can lurch around like a drunken sailor and still be considered stable for SPC purposes.

I think bugs should be stable if you normalize it by a size metric. So bugs/SLOC or bugs/story points.

a1369209993 · 2024-03-08T09:38:31 1709890711

> "Stable" [...] means that you have a mean and variance

And software development, as a process, does not have a (finite) mean or variance.

> I think bugs should be stable if you normalize it by a size metric.

Bugs approximately follow a heavy-tailed distribution, such that as the number of samples increases, the empirical average work to fix (and possibly other measures of severity) does not converge to any finite value, but instead increases without bound. (I think roughly logarithmicly, but don't quote me on that.)

In particular, it doesn't satisfy the requirements of the central limit theorem, so a lot of statistical techniques work poorly or not at all on software projects that are large enough, individually, to do statistics on (as opposed to doing statistics on populations of software projects, which seems to mostly work, usually).

kqr · 2024-03-08T10:27:37 1709893657

> Bugs approximately follow a heavy-tailed distribution

What's your data on this? In my experience, bug fixing time has been surprisingly thin-tailed. (High variation, yes, but not subexponential.)

My experience here is also mirrored by Avery Pennarun, who observes that "every bug is the same size": https://apenwarr.ca/log/20171213

mharig · 2024-03-09T11:08:36 1709982516

> And software development, as a process, does not have a (finite) mean or variance.

Ever heared of something called linear regression? E.g. LOCs/week might be an interesting enough metric for a growing company.

Edit: better example

roenxi · 2024-03-08T01:18:13 1709860693

They aren't stable. Bugs, viewed as a timeseries, don't have a steady mean and standard deviation. There is a feedback loop where those measures change depending on the maintenance and development strategy of the software, and priorities are going to shift suddenly. If there is a fix-small-bugs month for example, or a fix-this-client's-bugs push, or a fix-the-bugs-in-the-new-feature campaign. The statistical properties of all these are not identical.

And once you've normalised for story points, what you're measuring is stability of the story point allocations. Which is to say, different managers will get different means and standard deviations even if the devs change nothing about their work habits. It isn't measuring the devs, software quality or even real number of bugs; it is measuring the ticket creation process.

As long as you know it isn't measuring the software, all is fine. But people do get confused.

jacques_chester · 2024-03-08T02:18:44 1709864324

> If there is a fix-small-bugs month for example, or a fix-this-client's-bugs push, or a fix-the-bugs-in-the-new-feature campaign. The statistical properties of all these are not identical.

For sure -- but these are surely special or assignable causes of variation. That's one of the key insights in SPC: that some variations are ordinary and some out of the ordinary. A "fix the bugs" campaign is out of the ordinary.

> Which is to say, different managers will get different means and standard deviations even if the devs change nothing about their work habits.

If they have a bad measurement rubric, certainly. But that in turn goes to questions of gauge reliability, a topic about which SPC has a lot to say.

I should say by way of agreement that I'm not convinced that SPC can be safely and effectively applied to software development processes. But I'm also not convinced it can't. I don't have a firm, final position on the matter. I would need to apply it for a while and see for myself.

roenxi · 2024-03-08T02:48:25 1709866105

> For sure -- but these are surely special or assignable causes of variation.

Well, for starters they are process changes. The team is working in a different way to normal and, as expected, getting a different result. So it is special variance, but because management has decided that the team should adopt different processes with different statistical characteristics. These aren't one-offs, this is routine for how a well managed software team should perform. Priorities ought to change as the situation develops. And regardless, regular variations are what indicates that a process that isn't under control. If a process is under control, variances should be remarkably rare.

Every month in software engineering is out of the ordinary. That is why SPC doesn't work; there isn't a repeatable process to model. There are lots of different processes chained together as vague specifications are continuously translated into logically formal specifications in an environment of continuous renegotiation. It might be possible to make a software team work in a controlled way, but it is pretty stupid - either the team won't fix some bugs to make the metrics look good, or they have to fake bugs from time to time to make the metrics look reasonable. Both are inferior to non-statistical controls of prioritising bugs with reference to how challenging the fix seems likely to be.

Say we're going to implement an application. The bugs aren't going to appear as a stable process, there'll be some sort of big wave up front, then successive waves as new features are identified. Eventually the application will be more or less finished and the engineering team will clean up the long tail of bugs. The mean and standard deviation of the work aren't predictable in advance and aren't in control as far as SPC is concerned, because those tools assume steady state. Smoothing that curve from a hill-shape to a flat line is not value-additive; it is bad management in its own right. Resources should be reallocated to fix bugs after a big release and then moved back to dev work.

Now there are going to be some situations where SPC isn't crazy, but it is still a bad management tool for software because the SPC teams aren't going to be agile in the face of change, because all their tools will scream blue murder at them for no reason.

jacques_chester · 2024-03-08T14:36:29 1709908589

I have enjoyed a different experience of software development: the team breaks down stories into units of roughly equal complexity. Bugs get fixed first and do not contribute to velocity.

Over a medium time frame I've seen this approach produce remarkable stability in velocity measurement.

Note that this works by allowing product priorities to fluctuate, but not allowing quality to be a dial that can be turned. A distressingly rare configuration of power.

lifeisstillgood · 2024-03-08T07:08:43 1709881723

For interest: https://en.m.wikipedia.org/wiki/Stable_process

And

“Until we can define what a stable process is, we are doomed to argue forever all use of any statistical metric. For the love of a all science, please help!”

https://www.isixsigma.com/variation/what-stable-process/

jacques_chester · 2024-03-08T01:10:45 1709860245

This is one of my favorite introductions to this topic area, especially since it gets away from the dominance of manufacturing applications for SPC.

3abiton · 2024-03-08T10:52:24 1709895144

I tried to rub shoulders with this topics, mainly for application in software observability, yet I still fail to see it's relevance compared to advanced methods.

jacques_chester · 2024-03-08T14:26:14 1709907974

Could you elaborate? I would love to learn more.

shadowsun7 · 2024-03-08T09:09:14 1709888954

A couple of quick notes, from someone who has actually put this to practice — and in a non-manufacturing context, to boot!

(From a brief reading of this thread, it seems like kqr, jacques_chester, and I are the only ones who have put this to practice in non-manufacturing contexts — though correct me if I'm wrong.)

The bulk of the debate in this HN thread seems to be centred around what is or isn't a 'stable process'. I think this is partially a terminology issue, which Donald Wheeler called out in the appendix of Understanding Variation. He recommends not using words like 'stable' or 'in-control', or even 'special cause variation', as the words are confusing ... and in his experience lead people to unfruitful discussions.

Instead, he suggests:

- Instead of calling this 'Statistical Process Control', call this 'Methods of Continual Improvement'

- Use the term 'routine variation' and 'exceptional variation' whenever possible. In practice, I tend to use 'special variation' in discussion, not 'exceptional variation', simply because it's easier to say.

- Use the term 'process behaviour chart' instead of 'process control chart' — we use these charts to characterising the behaviour of a process, not merely to 'control' it.

- Use 'predictable process' and 'unpredictable process' (instead of 'stable'/'in-control' vs 'unstable'/'out-of-control' processes) because these are more reflective of the process behaviours. (e.g. a predictable process should reliably show us data between two limit lines).

Using this terminology, the right question to ask is: are there processes in software development that display routine variation? And the answer is yes, absolutely. kqr has given a list in this comment: https://news.ycombinator.com/item?id=39638491

In my experience, people who haven't actually tried to apply SPC techniques outside of manufacturing do not typically have a good sense for what kinds of processes display routine variation. I would urge you to see for yourself: collect data, and then plot it on an XmR chart. It usually takes you only a couple of seconds to see if it does or does not apply — at which point you may discard the chart if you do not find it useful. But you should discover that a surprisingly large chunk of processes do display some form of routine variation. (Source: I've taught this to a handful of folk by now — in various marketing/sales and software engineering roles —and they typically find some way to use XmR charts relatively quickly within their work domains).

[Note: this 'XmR charts are surprisingly useful' is actually one of the major themes in Wheeler's Making Sense of Data — which was written specifically for usage in non-manufacturing contexts; the subtitle of the book is 'SPC for the Service Sector'. You should buy that book if you are serious about application!]

I realise that a bigger challenge with getting SPC adopted is as follows: why should I even use these techniques? What benefits might there be for me? If you don't think SPC is a powerful toolkit, you won't be bothered to look past the janky terminology or the weird statistics.

So here's my pitch: every Wednesday morning, Amazon's leaders get together to go through 400-500 metrics within one hour. This is the Amazon-style Weekly Business Review, or WBR. The WBR draws directly from SPC (early Amazon exec Colin Bryar told me that the WBR is but a 'process control tool' ... and the truth is that it stems from the same style of thinking that gives you the process behaviour chart). What is it good for? Well, the WBR helps Amazon's leaders build a shared causal model of their business, at which point they may loop on that model to turn the screws on their competition and to drive them out of business.

But in order to understand and implement the WBR, you must first understand some of the ideas of SPC.

If that whets your interest, here is a 9000 word essay I wrote to do exactly that, which stems from 1.5 years of personal research, and then practice, and then bad attempts at teaching it to other startup operator friends: https://commoncog.com/becoming-data-driven-first-principles/

I don't get into it too much, but the essay calls out various other applications of these ideas, amongst them the Toyota Production System (which was bootstrapped off a combination of ideas taught by W Edwards Deming — including the SPC theory of variation), Koch Industries's rise to powerful conglomerate, Iams pet foods, etc etc.

roenxi · 2024-03-08T09:51:30 1709891490

> (From a brief reading of this thread, it seems like kqr, jacques_chester, and I are the only ones who have put this to practice in non-manufacturing contexts — though correct me if I'm wrong.)

And roenxi.

> So here's my pitch: every Wednesday morning, Amazon's leaders get together to go through 400-500 metrics within one hour.

Amazon's core value proposition is they maintain a large and very physical fleet of machines that they rent out. With serious standards for up-time that they can take real pride in.

They don't sell themselves as a software house. I'm sure they have tentacles everywhere and they aren't bad at it (if anything I'd expect them to be pretty good on a given project), but they've greatly benefited from using other people's software - they don't have their own DB for example, they reuse others and have a couple of PostgreSQL forks for more at-scale use cases.

I'm sure they get huge value from SPC (anything physical generally benefits from it), and I'm sure they use SPC for software out of reflex; but it doesn't follow that it is driving productive behaviour in the software branch of the business. A fleet of ~infinite servers benefits from controlling 400 metrics. Software development does not.

shadowsun7 · 2024-03-08T10:06:08 1709892368

What would you say if I told you Bryar has lots of stories of this style of thinking applied in early Amazon? This is pre-AWS Amazon, mind you — where they were trying to figure out how to build e-commerce web software at scale, from scratch. Granted, the bulk of their process control was directed at customer-facing controllable input metrics, but the software engineers were as much a part of it as the operational folks.

I don't have his permission to tell some of these stories, but Eugene Wei has some hints of it here: https://www.eugenewei.com/blog/2017/11/13/remove-the-legend

(To be fair to you, you are adamant that SPC does not apply to software development — which I take to mean measuring the productivity or act of building software. And I think we are all in agreement there! (That said, like kqr and jacques_chester, I want to believe that this has not been sufficiently explored) But it's not true that SPC has no place in software development — one way I've used this is that because XmR charts detect changes in variation, you can use it in a customer-facing software context to see if a feature change has resulted in user behaviour change without running an A/B test. Naturally, it makes sense to have the software engineer be responsible for observing this behaviour change themselves, since XmR charts are easy enough for the layman to use, and it gives them a sense of ownership for the feature or change. Some detail (on usage vs A/B tests) here: https://commoncog.com/two-types-of-data-analysis/)

BizOpsZen · 2024-03-09T13:21:18 1709990478

Saw this on twitter...I actually think SPC can apply to Software Development in that the concept of normal variation, and being able to understand and measure the range, can be pretty useful. More detailed comment here if interested...

https://news.ycombinator.com/item?id=39651368

hef19898 · 2024-03-08T10:00:32 1709892032

OP was talking about the Retail and Logistics side of Amazon, from what I can tell...

nhinck3 · 2024-03-08T13:03:58 1709903038

I've applied it to adverse events in health care. Works quite well as a trigger for investigation.

hef19898 · 2024-03-08T09:51:27 1709891487

Very interesting to get the perspective of someone who did thisbin a non-manufacturing evironment. One interesting bit, for someone like me who knows SPC from manufacturing related processes, are the discussions around what a stable process is. Because I cannot remember a single of those discussions ever in manfucturing related fields. Intriguiging, especially since on HN sometimes discussion miss the point by turning into disputes about the exact definition of a term, something that sounds very similar to the "misunderstandings" about stuff like special-casue variation you described.

Edit: Fully agree on the Amazon style WBR, what you said is exactly what is happening at Amazon. Daily during Q4 peak for a large enough subset of metrics.

fjkdlsjflkds · 2024-03-08T06:35:46 1709879746

I stopped reading at this point:

> Roughly half of your measurements will be above average, and the other half below it.

This is simply not true for most definitions of "average" (i.e., for all definitions of "average" except the median).

Example:

> (data <- c(1:10,100))

[1] 1 2 3 4 5 6 7 8 9 10 100

> sum(as.numeric(data > mean(data)))/length(data)

[1] 0.09090909

A 90/10 split is hardly "roughly half of your measurements".

kqr · 2024-03-08T06:43:03 1709880183

Would it help you get past that point if it had said "of a stable process"?

fjkdlsjflkds · 2024-03-08T07:08:07 1709881687

Well... what do you mean by "a stable process" in this context?

Let's try repeating the same example, but now drawing samples from a fixed distribution (in this case, a log-normal distribution):

> data <- exp(rnorm(100))

> sum(as.numeric(data > mean(data)))/length(data)

[1] 0.32

So, again, quite far from a 50/50 split, even though I am assuming a stable/fixed data generation process.

In general, it would help if statistical subjects are not presented in a careless way (i.e., containing things which are obviously not true). I would suggest at least adding an "assuming a symmetrical distribution" (so that at least your claim is approximately correct under the arithmetic average and for bounded variance distributions).

EDIT: If by "a stable process" you mean "a process following a stable distribution"... then, no, it doesn't help.

Here's an example with samples drawn from a Lévy distribution (which is a stable distribution):

> data <- rmutil::rlevy(100)

> sum(as.numeric(data > mean(data)))/length(data)

[1] 0.07

qznc · 2024-03-08T10:17:19 1709893039

Draw an control chat with your data. Here is one for your initial example: http://beza1e1.tuxen.de/spc?data=1,2,3,4,5,6,7,8,9,10,100

It is immediately obvious that it is not a stable process.

fjkdlsjflkds · 2024-03-08T10:57:15 1709895435

Ok. So, let's try again...

First, let's draw some data and show that the sample mean is not roughly close to the sample median:

> data <- asinh(exp(rnorm(30)))

> mean(data)

[1] 1.227973

> median(data)

[1] 1.059046

> sum(as.numeric(data > mean(data)))/length(data)

[1] 0.4

Now you can take the dataset and make a control chart with it:

> cat(paste0(data,collapse=","))

0.397291781860484,1.01791591678607,2.29127398581896,0.317548192016798,0.825779972770721,0.978034869623426,1.45689922574378,0.722379000865545,2.68231132467641,0.786713029297768,1.20492120955161,1.7082762081373,0.632259821911453,0.346590855307735,2.48238023470879,0.0989934260605276,1.61233320755675,0.906918026775941,1.73743152912329,1.21715325934946,2.78776306537914,0.296838101056961,2.29303061152949,2.65277514999252,0.88486942904647,0.0860402641329708,1.255123685342,0.526097043743278,1.53307173756969,1.10017654868633

https://beza1e1.tuxen.de/spc?data=0.397291781860484,1.017915...

Looks like the data is "stable" (according to that website) and, yet, you get a 40/60 split, rather than a 50/50 split.

qznc · 2024-03-08T11:28:54 1709897334

I guess the next step in our discussion is if 40/60 is „roughly half“?

fjkdlsjflkds · 2024-03-08T13:23:08 1709904188

Oh. I didn't understand that "roughly" was supposed to be doing all the work, in this context.

I guess a 30/70 split is also "roughly" a 50/50 split, right?

Example:

> cat(paste0(data,collapse=","))

0.194377163769996,0.0102070939265764,0.309119108211189,0.0120786780598317,1.45982220742052,0.00158028772404075,0.035004295275816,0.329022291919098,-0.00635736453948977,0.0158683345454085,1.19240981895862,-0.0127659220845804,-0.00650696353310367,0.00716707476017206,1.85868411217008,0.374960693228966,0.114533107998102,0.591872380192402,0.469305862127421,0.60161713700353,0.000421158731442352,0.265325485949535,-0.00279113302976559,0.0168217608051942,-0.00654584643918818,0.701343388607726,1.84387017506994,-0.00461644730360566,0.0781831777299275,1.05989990859088

https://beza1e1.tuxen.de/spc?data=0.194377163769996,0.010207...

> sum(as.numeric(data > mean(data)))/length(data)

[1] 0.3

> mean(data)

[1] 0.3834637

> median(data)

[1] 0.09635814

Would you say that the mean and the median are "roughly" the same, in this context? I'm curious...

kqr · 2024-03-08T15:44:46 1709912686

Now this is getting interesting. I did not think there existed a data set with these properties!

I don't have a computer at hand, but if you bootstrap from that population, in how many cases are the XmR limits violated? If it's more than, say, 15 %, I would not consider that distribution stable in the SPC sense, and thus not really a counter-example.

Edit: I found a computer with R:

> mean(replicate(5000, signal(sample(xx, length(xx), replace=T))))

[1] 0.514

This implies a false positive rate of 98 %, so I'd reconsider using XmR charts with this distribution.

fjkdlsjflkds · 2024-03-08T20:50:41 1709931041

> I don't have a computer at hand, but if you bootstrap from that population, in how many cases are the XmR limits violated? If it's more than, say, 15 %, I would not consider that distribution stable in the SPC sense, and thus not really a counter-example.

Still sounds a bit like goalpost moving to me... now I need to perform bootstraps (and change the order of the samples arbitrarily) to even define if a distribution in "stable" (i.e., stationary) or not?

Either way, I think my original point still stands: different "averages" have different properties, and the claim that arbitrary "averages" will be good estimates of the population median (without invoking anything regarding distributional symmetry) seems rather unfounded.

Of course, if you start adding terms like "roughly", and then extend its meaning so that "30 is roughly 70" (even though 30 is less than half of 70), then I guess any "average" (since, by definition, it exists between max(data) and min(data)) will be some sort of some "rough" estimate of the median, at least to some orders of magnitude (since it also, by definition, exists between max(data) and min(data) as well), sure.

I'm still not reading the rest of the article posted, though. I remind you that what was written did not mention "stability" in any way. It simply said:

> Again, you and I know better. A statistic known as “average” is intentionally designed to fall in the middle of the range. Roughly half of your measurements will be above average, and the other half below it.

This, as it is written, is sloppy. And I rather not ready something sloppy.

Have a nice day, though.

kqr · 2024-03-09T06:38:03 1709966283

Okay, that makes sense. As the author I intentionally write beginner material with some slop to convey the intuition rather than exact lemmas. This is not what you're looking for and that's fine.

I still will keep your criticism at the back of my head and be more wary about sweeping generalisations going forward. Thanks.

It would be nice if someone thought of all edge cases and wrote a formally correct treatment, though! (The statistician's version rather than the practitioner's version, I suppose.)

fjkdlsjflkds · 2024-03-09T11:52:28 1709985148

I'll just leave a final comment: if you restrict yourself to the arithmetic mean, then you can use Cantelli's inequality to make some claims about the distance between the expectation and the median of a random variable in a way that only depends on the variance/st.dev.

See: https://en.wikipedia.org/wiki/Chebyshev%27s_inequality#Cante...

On the other hand, you do not actually know the (population) expectation or (population) variance: you can only estimate them, given some samples (and, quite often, they can be undefined/unbounded).

Also, as I was trying to demonstrate in my previous comment, most "averages" are poor estimators for the expectation of a random variable (compared to the arithmetic sample mean), the same way that min(data) or max(data) are poor estimators for the expectation of a random variable, so it seems a bit "dangerous" to make such a general broad claim (again, in my humble opinion).

fjkdlsjflkds · 2024-03-09T09:37:47 1709977067

I was not aware you were the author. I apologize if anything in my delivery came across as harsh.

I would just suggest considering whether the "any (sample) average is a rough approximation of the (population) median" is a necessary claim in your exposition (particularly as it is stated).

Given this is supposed to be "beginner material", it would seem important not to say something that can mislead beginners and give them an incorrect intuition about "averages" (in my humble opinion). Note that adding the "but only for 'stable' distributions" caveat doesn't really solve things, since that term is not clearly defined and begginers would certainly not know what it means a priori.

I know this may came across as pedantic or nitpicky, but I would really like you to understand why such a general statement, technically, cannot possible be true (unless you really extend the meaning of "roughly"). When I read what is written, I see two claims, in fact (marked between curly braces):

> A statistic known as “average” is intentionally {designed to fall in the middle of the range}. {Roughly half of your measurements will be above average, and the other half below it}.

The first claim suggests that any average approximates the "midrange" (i.e., 0.5*(max(data)+min(data)), a point that minimizes the L_inf norm w.r.t. your data points). The second claim suggests that any average approximates the "median" (i.e., a point that minimizes the L_1 norm w.r.t. your data points).

The main problem here, as I see it, is that there is an infinite number of different possible means, densely convering the space between min(data) and max(data). Thus, unless you are ok claiming that both min(data) and max(data) are reasonable rough estimates of the median and the midrange, you should avoid such strong and general claim (in my humble opinion).

Note: you can choose a "generalized mean" that is arbitrarily close to min(data) or arbitrarily close to max(data); for example, see https://en.wikipedia.org/wiki/Generalized_mean

Either way... I lied... I did read some of the rest, and some of it was interesting (particularly the part about the magic constant), but the lack of formal correctness in a few claims did put me off from reading through all of it.

Once again, have a nice day, and please don't be discouraged by the harshness of my comments.

kqr · 2024-03-13T06:18:03 1710310683

I really do appreciate the criticism. You're factually correct, of course!

I also see now that statement about means comes off as more definitive than I meant it to be. When I find the time to I will try to soften the wording and make it clear that it's not strictly true.

kqr · 2024-03-08T10:34:53 1709894093

I think GPs point is that there are software processes that are fundamentally stable but still generate values like that. I'm in the process of writing an article on that topic, because it annoys me I don't have a good answer.

kqr · 2024-03-08T08:19:42 1709885982

In this context "stable" means the thing it means in statistical process control, i.e. the operational definition of no measurements outside of 2.66 times the mean consecutive difference between observations.

It is a problem -- particularly for software -- that SPC tools do not work with subexponential distributions, but it's separate from the observation that when SPC determines that a process is stable, rougly half of measurements will lie above the average.

shadowsun7 · 2024-03-08T09:44:30 1709891070

To be fair to OP, Wheeler never claims that for stable/in-control/predictable processes roughly half of the measurements will lie above the average. The only claim he makes is that 97% of all data points for a stable process (assuming the process draws from a J-curve or single-mound distribution) will fall between the limit lines.

He can't make this claim (about ~half falling above/below the average line), because one of the core arguments he makes is that XmR charts are usable even when you're not dealing with normal distributions. He argues that the intuition behind how they work is that they detect the presence of more than one probability distribution in the variation of a time series.

Some links below:

Arguments for non-normality:

https://spcpress.com/pdf/DJW220.pdf

https://www.spcpress.com/pdf/DJW354.Sep.19.The%20Normality-M...

Claim of homogeneity detection:

https://www.spcpress.com/pdf/DJW204.pdf

kqr · 2024-03-08T10:44:50 1709894690

I don't have the stats-fu to back it up but I would be very surprised if someone could point to a process where XmR charts are useful, but where the mean is not within 10–20 percentiles of the median.

fjkdlsjflkds · 2024-03-08T10:45:56 1709894756

> the operational definition of no measurements outside of 2.66 times the mean consecutive difference between observations

Not even a simple Gaussian distribution can hold up to this standard of "stability" (unless I understood incorrectly what you mean here):

> data <- rnorm(1000) # i.i.d. normal data

> mcd <- 2.66*mean(abs(diff(data))) # mean consecutive difference * 2.66

> sum(as.numeric(abs(data) > mcd))/length(data) # fraction of bad points

[1] 0.002

Unless you are willing to add additional conditions (e.g., symmetry), I still don't see how criteria that pertain to variance and kurtosis (e.g., "the operational definition of no measurements outside of 2.66 times the mean consecutive difference between observations") can imply any strong relationship between the (sample) arithmetic mean (or any other mean) and the (population) median.

In fact, even distributions for which the "arithmetic mean is approximately equal to the median" claim is roughly correct will almost certainly not display the same property when you use some other mean (e.g., geometric or harmonic mean).

Either way, if you have some reference that supports the stated claim, I will be very happy to take a look at it (and educate myself in the process).

BizOpsZen · 2024-03-09T13:05:44 1709989544

Came across this on twitter. Here's why I think SPC is related to Software Development (and Agile concepts, particular the burndown charts) more generally:

To be clear, they are related but not the same use case. IMO, both Agile and SPC leverage the same insight: variation is inevitable: what matters is not that it exists, but how you deal with it.

With SPC, you are establishing a normal variation so that you can identify abnormal activity that is warrants further investigation.

With Agile you're not really looking for outliers per se, it's more that you want to get to a place where your "normal" variation is a much smaller range. Because a smaller range leads to better quality and more output:

Variation in the software dev context is the difference between your estimates and the actual work required to deliver a feature, etc. High variation means you're constantly in a rush, need to cut corners, need to cut scope, etc.

This has a lot of downstream impacts in terms of quality but also in the actual scope of what you can deliver. In short, you need to spend more time fixing bigger problems.

Less variation means smaller problems and less time spent fixing --> more time is allocated to new feature development.

(and separate topic, but variation in Software dev has a special property where it only accrues in the "takes longer" side vs. the "take less time" direction. You never "make up time" because something is quicker than your estimate. see note below.

So the burndown chart is less about enabling you to see outliers, more providing visibility to the variation so that you can work towards making it smaller. If you're constantly loading work in the end of the sprint, you have a problem with the scoping process.

How does that track back to Agile?

One the key elements of Agile process is breaking work down into smaller batches --> and Breaking things down into smaller batches is* they key mechanism to reducing variability.

NOTE: *Software pretty much only takes longer than expected because there is high visibility into the fastest something can be done, but very little visibility into the unexpected things that can add scope to the project. So it's extremely rare for something to happen that make it take less effort than your estimates, but very common for things to add scope.

It's similar to estimating how long it will take to drive somewhere: you can get a pretty accurate sense of the fastest it will take based on distance and speed. But the things that extend the duration of the trip, like a car accident or unexpected road work, are just much more unpredictable. So if you were to plot that variation on a chart, you only see it move in one direction.

ilayn · 2024-03-08T11:28:09 1709897289

Another post to drive a control theory person up the wall.