This is a paper on the work that the article cited: https://www.melbourneinstitu...

xapata · on July 18, 2016

I skimmed the paper and couldn't find this "critical mistake". The nonlinear model you describe is common and appropriate to situations like this.

    y ~ x + x^2

A mistake would be

    y ~ y^2

And even then, it would be appropriate if we're modeling an autoregressive process.

    y_{t} ~ y_{t-1}

DaniFong · on July 19, 2016

I'll try to state in this language what I think is the conceptual flaw.

The stated conclusion is "there is an optimum number of work hours, and it is less than 40."

However, their method of analysis is this:

"We fit a non-linear model that is a quadratic model cognitive ability y ~ ax + b x^2 for x work hours / week, and we found a statistical optimum around 25 hours"

The problem with this is that you're trying to find the best fit of a parabola to the data. And you have tons of samples where the work hours are very few / none (unemployed). Because there is a fairly strong correlation with unemployment and cognitive indicators, the parabola is already being "forced down" near hours worked = 0.

Now in this parabola model of estimated cognitive indicators vs work hours, either you are going to get a minima -- and it goes to infinity at working hours -> infinity (of course in real life it cannot really do this, because we only have so many hours in the week, but the statistical model will suggest it) -- or, you are going to get a maxima, which is what actually happens.

It could well be in the data that the indicators are that there is roughly flat, or even increasing response of cognitive indicators to working hours when the number of working hours is beyond a nominal value, but that the unemployed population has somewhat lower indicators.

In this case the model will automatically become a downward curving, parabola with a maxima, suggesting decline with increasing work hours -- even though this is not what the data directly suggests.

This maxima, the fact that there even is a "work hour optimum" that is a smooth, quadratic curve, is a mirage -- the model is not the data.

A remaining question is why the optimum is less than 40 hours. It is relatively easy to construct a statistical case in which it is a curve fitting artifact, despite that there is no direct data even at at the suggested optimum.

One could in principle check to see if this is the case. The data may be available.

For now, there's few graphs on page 20. It really doesn't seem to me that there is a significant distinction between the part time and full time groups -- in fact, the biggest difference is that more women who have a high reading score are not unemployed. Men who have a higher symbols score are more likely to be full-time employed instead of part-time, slightly -- but the converse is true for men with higher reading scores. The difference is not very distinct.

https://www.melbourneinstitute.com/downloads/working_paper_s...

xapata · on July 19, 2016

> Because there is a fairly strong correlation with unemployment and cognitive indicators

You're arguing that there's an endogeneity effect -- that poor cognition causes less working? That's a common problem. The authors discuss their use of an instrumental variable technique to avoid this issue.

> parabola is already being "forced down" near hours worked = 0

Not sure what you mean. Typically a model like this includes a constant to allow for a non-zero dependent variable when all the explanatory variables are zero. To do otherwise in this case would be absurd. The idea that the average non-working person has zero cognitive function...

> smooth, quadratic curve, is a mirage

Ever heard of a Taylor polynomial?

DaniFong · on July 19, 2016

You'll learn a lot more if you ask, instead of "what could be wrong about what this person is saying", you ask "what could be right about it?"

xapata · on July 20, 2016

That's exactly what I'm asking for: a clearer explanation.

I don't believe the paper's conclusion, but I don't understand your criticism of it. If you're saying the estimated curve is inappropriate, a better argument would be that they should include more terms of the work-hours Taylor expansion to get a better fit. Or perhaps there are confounding variables left out of the model.

psychometry · on July 19, 2016

Um, what? The model is right there on page 5 and there's nothing wrong with it. The predictors include working hours, (working hours)^2, and others. The outcome is score on a cognitive assessment.

What's your agenda in trying to discredit this study with FUD, I wonder? Hopefully this can serve as yet another example of how to ignore commenters that sound like they know what they're talking about but actually don't.

DaniFong · on July 19, 2016

Look, I am 100% for the conclusion that we should not as a default case have people working too much! Even a 40 hour work week, for work that is statistically usual today, is in my opinion inhumane.

That philosophical belief however does not win out in the context of this particular analysis.

My agenda is to support sensible discourse on the question of how our brains adapt and how we live, using rigorous thought. Maybe I'm not seeing the problem with my reasoning but it seems pretty clear to me.

The problem is that you automatically get an upward or downward parabola, plotting cognitive scores versus working hours, if you have a non-zero coefficient in working hours^2

It's very unlikely that you'll get an upward parabola, especially given the strong anticorrelation between unemployment and cognitive scores that they use.

In science, you have a judgment call for which predictors you plug into the statistical analysis. Choosing any particular function, be it working hours^2, sin(working hours/100), or even, say, ballmerpeak(alcohol content) will effect the statistical results in factor analysis or anything else.

Since we chose only linear and quadratic functions to get coefficients for in our statistical analysis, the functions are going to be a parabola -- either up or down.

xapata · on July 19, 2016

The coefficient for the squared component could easily be zero. This is called "not statistically significant". There is no guarantee for a parabolic shape.

DaniFong · on July 19, 2016

The measure of possible data sets where, by standard statistical analysis, the coefficient for the squared component is zero, is tiny. There's no guarantee for a parabola, just a very high degree of certainty.

Your other point, that the parabola could be "not statistically significant," is true.

But given a strong degree of significant correlation between unemployment and the cognitive indicators, even if the dependence is totally flat for the cognitive indicators between 5 hours worked and 100 hours worked, you will still get a parabola by this method of statistical analysis.

Do not forget, this is model fitting.

xapata · on July 19, 2016

> is zero

Sigh. No coefficient is ever exactly zero, just very close [0]. I didn't think I needed to explain that when writing, "could be zero."

> you will still get a parabola

If the squared parameter is not statistically significant, the author will likely drop it from the model. In that case, we would not see a parabolic model and the paper wouldn't exist. The authors would have moved on to a different topic, or found a different dataset.

If the coefficient is so small that it is indistinguishable from zero (not significant), then we ignore the associated variable entirely. To do otherwise would require us to discuss an infinity of possible variables as if they mattered to the model.

> correlation between unemployment and cognitive indicators

If you're arguing that the author should have dropped all observations of unemployed persons from the dataset, that's completely separate and has nothing to do with parabolas.

[0] "ever" loosely defined.

psychometry · on July 19, 2016

"The functions are going to be a parabola." What functions? A plot of which two variables? I have no idea what you're talking about and I'm a grad student in biostats.

DaniFong · on July 19, 2016

I'm sorry, I am trying to explain something that is very clear in my head, and I'm pretty sure that I'm right, but I haven't had bio-stats training specifically so I do not know the language precisely. My background is in physics, computer science, and epistemology.

The functions I am referring to are estimators of cognitive indicators (like backwards digit span, say, that they use in the paper), as a function of working hours.

Take a look at page 21 for some plots. For each of the cognitive indicators, the estimator is a downwards parabola as a function of working hours. What I am saying is that this is an artifact of the analysis. The shape could be far different -- in fact it could be a bad case of curve fitting. Additionally -- why not just directly plot the data as a scatter plot or a binned average of cognitive indicators for bins between, say, 20 - 25 hours, 25 - 30 hours, etc? Then at least we could see if the parabolas are close to the data...

https://www.melbourneinstitute.com/downloads/working_paper_s...

psychometry · on July 19, 2016

There's nothing wrong with the inclusion of a quadratic term in a linear model if the variable is significant, which is clearly is according to Table IV.

You can't just plot a single predictor against the sample outcome and expect the plot to be particularly revealing in multiple regression. Plus, this isn't even multiple regression; this is a two-stage least squares multiple regression. The working hour (WH) variables are instruments, not predictors. See page 6.

Instrumental variables exist specifically to deal with the case of a possible bidirectional causal association between predictor and response.

DaniFong · on July 19, 2016

I'm afraid I don't think you are understanding my point, but apparently it is a difficult one to make.

I'm unfortunately too busy to make it clearer, so I will just leave you with a koan.

Why not include a third order term in the regression? What about an n-th order term? What assumptions do we "bake into" the results of a statistical regression as an effect of including, or not, any function on the original data?

The statistical significance of the quadratic term is actually dependent upon the presence of any more complex or higher order terms in the regression, just as the coefficients and the statistics of the linear term will depend on the presence of the second order term in the analysis.

I'm not saying you should never include a quadratic term in a regression, I'm saying we should understand what the regression is doing when it is fitting a model.

psychometry · on July 20, 2016

Why not include insignificant terms in the regression? Maybe because they're insignificant?

I understand what the regression is doing. The authors understand. You do not, though.

You've already admitting not to having a background in stats, yet you keep throwing around words like "significance" and "model fitting" without having the faintest clue what they mean mathematically. I'm sorry, but I can't fit several semesters of undergrad-level stats in these comment boxes.

DaniFong · on July 20, 2016

I have still not given up hope that this conversation is useful for somebody.

Here is my redoubled effort.

Part of what gives me hope is that the CEO of a prominent data analysis company, who does have a background in statistics and data analysis, and has a PhD in computational mathematics from Stanford, said that my original comment was "amazing" and that "the complacency of selecting variables is lost on the hive mind."

And, so, while I have diminished hope that I'll be able to get through to you this time, at the moment, since things seem to have regressed to statements asserting that I don't understand the "faintest clue" of things like model fitting and significance (this is absolutely false, actually, I'm quite deeply aware of the meaning), in fact this conversation does have at least some merit, even if outside of this, surprisingly argumentative, Hacker News context.

Including a specific number of terms in a Taylor series expansion (as per the suggestion of xapata, or in any expansion, be it a Fourier expansion, a Lagrange expansion, or whatever, is a somewhat perilous choice that can distort the meaning. Any form of model fitting has this problem. But one cannot dismiss all other models that could be fit to the data as insignificant in this case!

In particular, choice of a quadratic function for fitting, using constant, linear, and quadratic terms, automatically distorts this data, because of the nature of the data, where there is a anti-correlation between unemployment and cognitive indicators.

This is demonstrated by the following example, which took me about 5 minutes to construct -- and bit more to explain and write about here.

Suppose the data show a completely flat response for IQ versus working hours, except for the unemployed population, which has a lower set of cognitive indicators.

The data and curve are linked here.

https://mycurvefit.com/share/0530c696-2eb0-4f9f-8af1-3277c5b...

In this example, the data shows no optimum number of working hours, and IQ doesn't diminish for more hours worked. But the quadratic fit does suggest this: a peak for IQ near 25 hours of hours worked.

Obviously, this example is not the data the study worked on. The study doesn't directly share the data. But, from the graphs on page 20, the example I constructed is quite like the data. The part-time and full-time work probability density curves are practically identical to one another -- they are right on top of each other. The only really significant difference is between the working and not working populations.

Yet, the authors do not hedge their findings.

"Our findings show that there is a non-linearity in the effect of working hours on cognitive functioning. For working hours up to around 25 hours a week, an increase in working hours has a positive impact on cognitive functioning. However, when working hours exceed 25 hours per week, an increase in working hours has a negative impact on cognition."

and the study concludes "Our study highlights that too much work can have adverse effects on cognitive functioning."

In my judgment, this analysis does not demonstrate this, even though it would be convenient for me for this to be true.

PS:

Because they did two stage least squares, and instead of directly using working hours they used fitted values, there is a slight adjustment that needs to be done to the example above, in order to be relevant.

It is not entirely obvious exactly how well the anti-correlation for cognitive indicators will carry through after "working hours" are estimated by regression with the variables:

Vacancy rate, Inner regional, Outer regional, Remote, Very remote, Number of dependent, Children, Parent is still alive, Other public benefits, Australian citizen, Work experience, Ownhouse.

I mean literally the best connection there is is in "other public benefits" which is a variable with an effect measured in dozens of hours of work per week. Everything else is a far smaller effect. So, effectively, what the second stage of least squares is really doing, is doing a regression on the variables about versus cognitive indicators; and really mostly, upon whether or not they receive public benefits.

A large fraction of those people who have public benefits will have their "estimated work hours" estimated below 0, then will be reassigned to 0 for the purposes of the final regression. Hence, if there is an anti-correlation for "receiving other public benefits" (their terminology) with cognitive indicators -- and there is -- it will appear that there is a significantly lower set of cognitive indicators for the instrument WH* that they estimate.

After that, the rest of my toy example is still quite apt -- there can be no effect in IQ as a function of WH* or WH (as measured) outside of the unemployed population, even though the quadratic analysis will suggest an optimum.

DaniFong · on July 19, 2016

"You can't just plot a single predictor against the sample outcome and expect the plot to be particularly revealing in multiple regression"

Isn't this, effectively, what they are doing when they say they have found an optimum number of "working hours" for one to work to maximize cognitive indicators?

I'm not even quibbling whether they're demonstrating causality versus correlation. The problem is that their result and the number that they found are likely artifacts of the method of the analysis, and the choice to include only linear and quadratic terms.

DaniFong · on July 20, 2016

I should add, for the benefit of anyone that reads this in hindsight, that I actually very much endorse the question that this work seems to be asking:

How shall we live?

I do think the problem has much more to do with the nature of work, and not quite so much to do with its amount.