How to better calculate churn rates

dkulchenko · on Oct 19, 2020

I recently modeled churn for my subscription box calmbox (https://thecalmbox.com) by using an actuarial table (http://www.lifeexpectancy.org/lifetable.shtml, formulas at the bottom) which I found to be a lot more informative than the standard churn formula.

Subscriptions still active at month x are represented as `l(x)` and subscriptions that "die" (cancel/expire) in a given month are represented as `d(x)`.

This gives you a "life expectancy" and a "mortality rate" (so, churn) for any given number of months that a customer has been subscribed. So I can project how long someone will stay subscribed when they're brand new (at month 0) and how long they likely still have when they're at month 8 (longer than at month 0, funnily enough).

With those subscriptions where the month-specific churn will largely decrease the longer someone's subscribed (after passing the initial high-churn first months), this allows measuring/projecting churn on a much more granular level.

bmcahren · on Oct 20, 2020

Segmenting this data by signup date is useful as well.

If I'm not mistaken, you're aligning all time periods to the same imaginary start date. This gives you the grand perspective but ignores very real changes in your application, team, marketplace, advertising, and product-fit over time.

xyzzy_plugh · on Oct 20, 2020

Yes, you should absolutely segment into cohorts. Ideally you can tag users such that you can slice them not just by time, but what funnel they used or what marketing they've been exposed to (emails, etc.)

Just measuring the grand perspective, as you say, can be a very poor indicator for what is working well, and what isn't.

baron_harkonnen · on Oct 19, 2020

It honestly delights me to see someone correctly point out that churn is a classic example of survival analysis problem.

It's shocking how many data scientist I've known how no idea how to correctly model churn, typically trying to build some predictive model, which is just a more sophisticated extension of first formula in the article.

It's insane how many business problems are really survival analysis problems where you have some hazard function and censored data. In my experience having data scientists that can correctly identify this is very rare (and a great sign of how broken that entire field is right now).

sebastianmarkow · on Oct 19, 2020

This is the comment I was hoping to find here. I can only recommend reading McElreath’s Statistical Rethinking Book/Course in that context. https://xcelab.net/rm/statistical-rethinking/

epistasis · on Oct 19, 2020

I think about this everytime Backblaze publishes their new stats. It's a great resource, but some standard survival analysis would go a looooong way, and would let them incorporate usage stats amount other things. I even emailed once to offer to help, but they were (very understandably) not interested.

esyir · on Oct 20, 2020

Oof, this is me right now.

What makes it worse is that I've also done survival analysis, but never quite made the jump that churn is just a survival problem, though it's exceedingly obvious once you see it.

qsort · on Oct 19, 2020

Not going to argue with you that the field isn't broken -- it very clearly is -- but the problem the article explains is a rather simple one.

Your setting could be non-contractual. You might not be able to have reliable data to frame the problem as survival analysis with censored data. You might have different groups of customers with different behaviors.

Assuming a constant failure rate is the same as assuming the underlying process is Poisson, which is not an entirely stupid working hypothesis in many cases.

The moral of the story is "by the love of god look at your data before making dumb models".

sebastianmarkow · on Oct 19, 2020

Different population behaviours could be modeled via multi-level models.

qsort · on Oct 19, 2020

They can, but you start running into problems with the mathematical formulation of your model. For example, Weibull's CF doesn't have a closed form, which means it's hard to find convolutions even with itself, so that finding e.g. the sum of Weibull distributed r.v. is hard, even if they are iid.

Again, modeling done right is super hard.

dcl · on Oct 20, 2020

> It's shocking how many data scientist I've known how no idea how to correctly model churn,

This is because most 'data scientists' have never actually studied much statistics.

And for the rest, it's easy to forgot all the cool tools we learnt about at university when everyone in the corporate world just says 'give me the average revenue per customer', 'give me the average tenure' etc, etc.

equality_1138 · on Oct 20, 2020

what have data scientists generally studied if not for statistics?

dcl · on Oct 20, 2020

In my experience, 'data scientist' has become a very loose term, especially in the not terribly sophisticated corporate world. Typically, 'data science' isn't even very scientific, but a useful set of heuristics to get good model outcomes on data. Rarely is actual 'statistics' being done to get scientifically rigours results.

People employed as 'data scientists' often come from comp sci/physics/eng/econ etc. They might be great at fitting models that also have great out of sample prediction performance, but they may have never heard of maximum likelihood, or know when to use a Wilcoxon rank sum test.

mcrad · on Oct 22, 2020

Not very sophisticated corporate world? I'd argue it's very sophisticated to be loose with the meaning of scientist. We have a global debate happening about pseudo science due to some widespread acceptance to call anyone with a degree and some quantitative info a scientist.

TrackerFF · on Oct 20, 2020

A lot, and I mean a lot of Data Scientists are just what we called Business Analysts, Analysts, etc.

There's no cohesive group of Data Scientists, and they come from all walks of life. Economics, Business Administration, Life Sciences, Computer Science, Engineering, etc.

There absolutely are professional Data Scientists out there, that haven't touched more than Stats and Probability 101. (But FWIW, a lot of DS - even though they may lack the formal education, pick up these things underway)

mcrad · on Oct 22, 2020

Sounds pretty cagey to me. HR gives out the label scientist to someone who cherry pick data that supports the shareholders thereby ensuring accountability loopholes for the management. Nice work, techies!

disgruntledphd2 · on Oct 20, 2020

Yeah, agreed.

Spark even has functions for survival analysis (Kaplan-Meier I think), which means that one can actually run this kind of analysis in a consumer-tech space.

I remember spending weeks trying to get the right sampling/compute so that I could actually use survival analysis, so I would argue that it's not that people weren't aware of this, rather that (some) didn't have tools to allow them to use the methods effectively.

stevofolife · on Oct 20, 2020

Agree. Interestingly, are there any other examples where a problem can be solved with Y instead of X?

disgruntledphd2 · on Oct 20, 2020

Life time value is a time series problem mostly brutalised into a regression format.

equality_1138 · on Oct 20, 2020

broken what field? for how long?

mathattack · on Oct 19, 2020

The key here is cohort analysis. If you are on 1 year contracts, then the relevant denominator is revenue 12 months ago, and the numerator is revenue from those customers.

Just like NPS there can be a lot of intellectual dishonesty.

rubyn00bie · on Oct 19, 2020

I quite liked the more technical parts of the article and practical modelling tips for customer churn rates. But... I would more interested in a focused, more terse, article just examining how the modelling techniques you use can impact your understanding of a given data set. The bullshitting for the premise just sort of kept me distracted... Unless a16z updated their page recently enough to be different, they very plainly say there are multiple types of churn, then provide two clear examples, here's the first sentence about churn:

> There’s all kinds of churn — dollar churn, customer churn, net dollar churn — and there are varying definitions for how churn is measured.

I also find it a bit of a disservice they don't discuss other churn types while dismissing the notion completely in the introduction. Statistics is a really "it depends" kind of subject and if you don't put forth good effort to explain the assumptions it can be really hard to follow and even harder to correctly apply.

Next, I find it a bit off-putting they mostly modeled their way out of the problem instead of addressing the bigger meta question "how do I find out if customer churn is a meaningful metric for MY business?" This is a big assumption made by the article (customer churn modeled correctly is useful) but not supported by it.

Finally, I find this in the conclusion to be quite simply HI-LARIOUS for how brazen it is; while, IMHO, kind of missing their own point:

> The numbers we've just computed are perhaps the most central of all for a subscription business, yet it seems like most people have never been thought how to properly compute them.

encoderer · on Oct 19, 2020

It stands out that it's using customer count and not revenue churn. In a well running business you might see shrinking customer count but net-negative churn because you're growing revenue on the accounts that stay with you. This is really very common in any SaaS that gets big.

Calculating the customer churn is valuable for LTV computation but it doesn't seem as useful as a KPI for a growing saas business as net revenue churn rate.

davidivadavid · on Oct 19, 2020

Exactly. It's kind of well-known at this point that negative net dollar churn is what people should aim for.

The minutiae of risk modelling for something as inherently volatile seem beside the point. Maybe if you're a stock analyst?

mathattack · on Oct 19, 2020

For small sample sizes (enterprise customers) it’s tricky to get too crazy on stats. Simple models are better.

jrott · on Oct 19, 2020

practically if you are dealing with large enterprise customers it's better to use qualitative data from your account managers to figure out how much risk there is with each customer.

mathattack · on Oct 20, 2020

My experience is “they love us” isn’t as strong a retention predictor as cold hard usage metrics.

jrott · on Oct 20, 2020

They love us seems to never be that useful. I actually meant more in terms of there worried about foo or finding out about the dumb workaround there using. That stuff seems to be an early warning sign that something isn’t going well.

Jorge1o1 · on Oct 20, 2020

Great article but even this article contains some minor nitpicky inaccuracies in modeling churn.

The Lomax distribution (sometimes described as the gamma-exponential) would only be appropriate if we're modeling a continuous time process.

For a contractual product with discrete time intervals (aka monthly contract), it would be more accurate to use a geometric rather than exponential model such as the Beta-Geometric.

numlocked · on Oct 20, 2020

Bingo! The classic work on this is from Fader and Hardie:

https://faculty.wharton.upenn.edu/wp-content/uploads/2012/04...

A friend and I made an implementation in Python a while back:

https://github.com/chrisclark/retentionizer

jwithington · on Oct 20, 2020

Super glad to find this! Have you also seen the lifetimes library? https://github.com/CamDavidsonPilon/lifetimes

cpard · on Oct 20, 2020

Calculating churn is one of these things that there's a complete dichotomy on the web when it comes to the available information. You usually find very simplistic approaches to a quite complex problem and there's amazing research that is not easily accessible.

The article is going towards the right direction but as other commenters mentioned, churn is an example of a survival analysis problem.

A typical model used for modeling churn is pareto/nbd. Wikipedia is a great starting point https://en.wikipedia.org/wiki/Buy_Till_you_Die

paulstovell · on Oct 20, 2020

Absolutely not an expert here, but I do obsess about our churn numbers.

It seems this post is trying to make a model to predict churn or guess what future churn might be. But the main case for understanding churn to me is historical.

Last month we had 10 customers paying us $10 each. Two of them cancelled. I have gross churn of $20. That's not a model, just a historical fact. What's "wrong" with that calculation that these models solve?

I will definitely get different churn numbers if I look at groups of customers in cohorts based on when they first signed up, or enterprise-vs-SMB, or whatever. So I have reports to create all of those cohorts and show me the churn for each. I look at that data - with all the historical context - and make a decision about what to do next in the business. "Selling to SMB's was a good idea, but wow they churn a lot more. Let's focus our marketing on enterprises this month."

To me the churn calculation at the top of the article is plenty useful. I'm not sure what value I'd get from these advanced models. If they just exist to predict future churn or LTV, that doesn't seem particularly useful.

paulstovell · on Oct 20, 2020

Also if the problem is "churn as a single number" is wrong because it misleads people (the example being churn is up because the company is growing) surely the answer is to show a cohort analysis or historical data - not a forward looking or "best fit" model?

onlyrealcuzzo · on Oct 20, 2020

Predicting LTV is quite useful for marketing and advertising.

andrenotgiant · on Oct 20, 2020

Something I've never gotten a good answer to: Say I have two groups of 1000 customers (e.g. US customers vs UK customers) and I want to compare the churn rates between them. BUT these are users of varying ages, they didn't all sign up at the same month. It could be that one group is slightly younger or older than the other.

How do you compare churn rate of groups that aren't cohorted by age?

dcl · on Oct 20, 2020

You can try the proportional hazard model https://en.wikipedia.org/wiki/Proportional_hazards_model

In this setting, things like age, gender, or other features of interest are the covariates/predictors. You can then also use indicator features like UK or US for geography. If the geography based features are significant, then the difference in churn between the regions is perhaps not simply due to age etc.

pauldickwin · on Oct 20, 2020

When you let marketing people do the math, they come up with the "constant" churn equation.

malshe · on Oct 20, 2020

Both Pete Fader and Bruce Hardy are "marketing people":

https://marketing.wharton.upenn.edu/profile/faderp/

http://brucehardie.com

frankcort · on Oct 20, 2020

Seems like something Profitwell or Chartmogul should add to their offerings. We rely heavily on Chartmogul for our weekly stats/scorecard and it looks like we've been wrongly celebrating our "super low churn rate" for a while now!