I always find it concerning when authors dichotomize a variable without an extremely solid basis for doing so. It reduces statistical power, and it hints of over-mining the data to dredge out an effect.
This group of authors is well respected and known for doing studies like this. However, if their chief interest is observing the effect of physician age on patient outcome, physician age should clearly be treated as a continuous or truncated variable here.
Edit: In the supplement [1. Table B], they do perform the calculation with physician age as a continuous variable, and the effect stands. Good on them for doing the math in this way.
Physician age was modeled both as a continuous linear variable and as a categorical variable (in categories of <40, 40-49, 50-59, and ≥60) to allow for a potential non-linear relation with patient outcomes.
can you explain what it means? Specificially, how does making it categorical allow for it, and keeping it continuous prevent it?
Probably because when treating it as continuous they only look for linear regression.
When categorizing in buckets, they probably do an ANOVA. This technique posits that the average does vary per category exactly as measured, and asks the question: If I tell you the category, how much is the variance of your data reduced? If the variance falls a lot (relatively to what it was), it means there's a statistically significant effect between the category and your variable.
And, in their defense, they can't really go fishing for different continuous relationships once they have the data, as that'd reduce their statistical power.
Of interest too is the number of adjustable parameters of the model:
If instead of four age categories you use, say, four hundred, you end up splitting each doctor into one category. The predictive power of that model is greatest, with very good statistical significance, but you have achieved no insight at all.
Similarly when taking age as continuous; if instead of a straight line you fit a curve with four hundred free parameters, you overfit it to the point of destroying any insight.
So in that sense it's "unfair" that they used four age categories, vs two free parameters of a linear regression. And there would need to be some explanation as to the age ranges they used for each category.
This group of authors is well respected and known for doing studies like this. However, if their chief interest is observing the effect of physician age on patient outcome, physician age should clearly be treated as a continuous or truncated variable here.
Edit: In the supplement [1. Table B], they do perform the calculation with physician age as a continuous variable, and the effect stands. Good on them for doing the math in this way.
1 = http://www.bmj.com/content/bmj/suppl/2017/05/15/bmj.j1797.DC...