> For this study, researchers conducted technical interviews of 48 computer science undergraduates and graduate students. Half of the study participants were given a conventional technical interview, with an interviewer looking on. The other half of the participants were asked to solve their problem on a whiteboard in a private room. The private interviews did not require study participants to explain their solutions aloud, and had no interviewers looking over their shoulders.
> Researchers measured each study participant’s interview performance by assessing the accuracy and efficiency of each solution.
[...]
> “People who took the traditional interview performed half as well as people that were able to interview in private,”
I appreciate their effort to use science to support the narrative that tech interviews suck, because I do think they suck and need to change, but isn't N=48 a little small? Also, if it was only undergraduates or graduates, who's to say their performance is indicative of the rest of the tech workforce?
To acquire a larger N would mean recruiting more participants, which would require either compensation/extra credit or rely on people volunteering ~1 hour of their time. Likewise, collecting that data would require time and money. It is an time/resource issue with human research. You can require a small N and get some results or require a large N and not have enough funding to support it.
Also note, what is a "large N"? There is no set in stone amount and I've even seen reviewers say a sample size of 10,000 is too small.
Of course I understand why the research has N=48, my question is whether it is adequate to fully support the claim in the title. I really don't know. I'm not a scientist or sociologist. My intuition tells me that it seems small, and using a population of college students intuitively feels like it wouldn't adequately capture the diversity of professional software developers, but maybe it does in this case.
It is; however, my next understanding is this is why we are seeing a rise in meta-analysis papers. It is difficult to acquire a sufficient N with a single study. As a result, you see authors consolidating all of the papers within a topic and discussing the trends seen in each.
However, research also has issues with reproducibility. No one will publish work that was simply a repeat of someone else work. Researchers have to twist or add something. It is a discussed issue, but no one in power has made attempts to rectify it.
Depends on the size of the effect you wish to measure, and the underlying variance. They got a massive effect, which to me, seems improbable, and I half suspect the randomization is at play.
I think it'd be pretty simple and cheap to replicate the things we care about in this study. All you really need is a whiteboard, some markers, a small room, a few interviewers and a candidate pool. The eyetracking getup really only gets in the way of that and seemed more like a problem in search of a solution.
I'm curious as to what makes a good size of N. To me, N=10 would be do small, N=48 seems a little under what is should be, and N=100 seems sufficient, but I don't have a real basis for this (in fact, it's probably because of the original count that had me settle on a min and max of 10 and 100). Maybe a better question is: What are factors that studies consider, besides the logistics you rightly pointed out, in determining a sufficient N?
That depends on the size of the population being sampled from, the margin of error, and the confidence level.
For a huge effect like the one shown in the study, where one side performed 2x as well as the other, a sample size of 48 is more than large enough to say that the result is statistically significant. If there was as small effect, that wouldn't be the case.
Put it this way. You want to find whether people from California prefer Taco Bell or Pizza Hut, so you randomly sample 100 people. If all 100 people say Taco Bell, then you can be reasonably confident that more people from California prefer Taco Bell. Because if at least 51% of your population preferred Pizza Hut, the odds of not getting one of those people in your sample are minuscule (the odds of getting all Taco Bell people in your sample if 49% of the population prefers Taco Bell is 0.49^100).
If 51 prefer Taco Bell and 49 prefer Pizza Hut, your confidence level is too low to be useful--you need a larger sample size.
I answered it in another response, but I think this is why we're seeing a rise in meta-analysis papers. Take a bunch of small N's, consolidate them, and analyze their trend. This analysis can also be strengthened by evaluating the effect size of the phenomena [1]. However, I would say using effect in meta-analysis is a very complex approach that limits the set of researchers that could conduct the analysis appropriately.
The point of a whiteboard interview for me isn't about accuracy and efficiency. It's about seeing whether the person can code and can talk to me about coding.
Yes, of course. I rely on compilers, linters, and editor syntax highlighting when writing code, so it'd be pretty hypocritical for me to penalize people who can't access those things. People can sketch out a pretty messy pseudo-code implementation if they want, and then we'll just spend the rest of the interview cleaning it up, making it correct, extending it, talking about quirks of their implementation.
I was an undergraduate at NC State. I wonder how the study could be replicated at UNC and Duke. I'd wager they'd have a significantly higher success rate.
> For this study, researchers conducted technical interviews of 48 computer science undergraduates and graduate students. Half of the study participants were given a conventional technical interview, with an interviewer looking on. The other half of the participants were asked to solve their problem on a whiteboard in a private room. The private interviews did not require study participants to explain their solutions aloud, and had no interviewers looking over their shoulders.
> Researchers measured each study participant’s interview performance by assessing the accuracy and efficiency of each solution.
[...]
> “People who took the traditional interview performed half as well as people that were able to interview in private,”