I don't think you're being too hard, but your instrument may be too blunt. If you basing a large part of your hiring decision off this one test then you may be eliminating people unnecessarily and occasionally including someone who doesn't belong. For example I'm pretty rusty as a programmer, having graduated to management many years ago (and a linux admin before that) and I could pass this test, but you wouldn't want me as a senior programmer.
The point I'm getting at, is that like any science experiment you need to control for the variables. Just like an economics experiment might control for wealth, or education, you have to control for people who get nervous during interviews, people who have a blind spot for this particular problem, etc.
An easy addition to your methodology would be to have three different problems and let them choose. I would see how that affects their performance.
In my company I give potential hires a test for them to complete at home, before even interviewing them. The average time to complete the test is 6 hours, but we've never had anyone outright refuse to attempt it, and it gives us a giant steaming pile of data. So if my experience is any guide (and it might not be) I would think you could make your test more intensive, and potentially end up with more and better candidates.
The point I'm getting at, is that like any science experiment you need to control for the variables. Just like an economics experiment might control for wealth, or education, you have to control for people who get nervous during interviews, people who have a blind spot for this particular problem, etc.
An easy addition to your methodology would be to have three different problems and let them choose. I would see how that affects their performance.
In my company I give potential hires a test for them to complete at home, before even interviewing them. The average time to complete the test is 6 hours, but we've never had anyone outright refuse to attempt it, and it gives us a giant steaming pile of data. So if my experience is any guide (and it might not be) I would think you could make your test more intensive, and potentially end up with more and better candidates.