Hacker News new | past | comments | ask | show | jobs | submit login

It doesn't have to be a black and white question - I think there would be benefits to having fewer standardized tests, and to de-emphasize them when making policy (especially funding decisions) in favor of other metrics.



What kind of other metrics?

Keeping in mind that if it's not standardized, the numbers are statistical garbage...


Is your compensation determined by a standardized test? Why not? How can the world possibly work if we don't use standardized metrics for everything?!


My compensation is not determined by a hard metric because my employer has no way of measuring my direct contribution to the company's profits. If they did, you can be sure they'd pay me accordingly; this is why people in sales are often paid on commission, and why people in "billable hour" fields are scolded or fired if they don't hit their billing targets. In performance reviews, or (in my current line of work) consulting, it's not uncommon at all to bring up any and all evidence that you can think of to support a claim that you have (or will) contributed positively to the company's bottom line.

Granted, we lack such a clear measure of success in education, and current standardized tests are a ham-fisted approach to creating one. Don't get me wrong - most of the standardized tests we use today follow the exact same approach to testing as they did 50 years ago, and whatever it is that we would like to measure when we say "educational progress", they no more accurately capture it now than they did then.

But I'm not convinced that the general idea of measuring education through testing is wrong. I think what most anti-testing people balk at is the particular (and extremely limited) set of knowledge that most standardized tests measure - they tend to be more okay with the idea of measuring mastery of a particular subject (either through in-class tests, or even standardized subject tests, which draw far less criticism than the general tests). I'd be all for trying to figure out a way to measure general achievement in terms of subject-specific but still standardized tests (which not every student takes every one of), but there are a lot of difficulties there, too.


I would personally want to see it work in industry first. Once someone comes up with a way to directly tie programmers' pay/promotions to code metrics in a fully automated, standardized way, and shows that it works, then I'll believe there's a possible way to make that approach work for education, too.

(If anything, the code-metrics problem should be easier, because you get a large sample of data over an extended period of time that represents their actual work output, not an artificially staged test that takes a few hours.)


The problem is that programming is a far more heterogeneous task than teaching and the goals are often not known apriori. How do you compare code output while writing an MRI reconstruction algorithm to code output while writing HFT software?

In contrast, I taught calculus many times. The goals and methods were exactly the same each time (ignoring small differences between the Rutgers and NYU syllabus): students should know how to differentiate and integrate, understand linear approximation, etc.

Measuring performance in reality is usually not that hard. It's just a few special professions (typically creative ones) where you run into difficulties.


> The problem is that programming is a far more heterogeneous task than teaching.

I read this part of your comment, and immediately assumed that you were not a teacher. Then you said you were, and furthermore that your "methods were exactly the same", which surprised me for two reasons. First of all, the task itself is extremely heterogeneous within-task even if you re-teach the same subject several times; second, no two classes are really the same, and although there are classes I've taught seven or eight times now, the particular dynamic of the particular classroom required adaptation, using different examples, different activities, and in some cases different evaluations. Not to mention all the different levels and subjects of classes that I've needed to teach.

At least, that's what good teachers do. (I don't mean that as an attack on the parent poster's teaching---I find it at least possible that they did all these things but didn't realise how much variability there was in the task.) There are, of course, mediocre teachers, who get through their task by learning a pattern and doing it the same way each time. But that kind of gets back to the OP: it may be that the middle is easy to measure, but the high end, not so much.


How is the task heterogeneous? Yeah, class dynamics differ a little bit depending on who the students are, and occasionally you get a few extra geniuses or dunces. But the goal of calc 1 is always to get students to integrate, differentiate, and do linear approximations.

In contrast, my current programming job was to build a visual search engine, my last was to trade stocks. Before that it was research in MRI, before that quantum mechanics simulations. Who knows what my next will be?

It makes sense to compare # of students who can integrate in 2012 to # of students who can integrate in 2011. It's a noisy measurement, but it works. How do you compare search engines to quantum mechanics simulations?

As for my teaching, the data I have suggests I was average. But there was not enough data to get a good picture - I only taught a couple of classes with standardized finals.


> How is the task heterogeneous?

Within a single course, the heterogeneity comes from how to actually do things, how to understand what you did, how to understand what other people are doing when they do the same things, how to think about changes to those patterns when the task changes a little, how to write about what you did, how to talk about what you did, how to read/listen to other people talking about what they did, and many other aspects of understanding the material. None of these are actually the same thing; the unifying thread in a given content area is you might have to memorise the same jargon and diagram conventions for each of those different tasks.

A given course may also be somewhat heterogeneous in its content; for instance, a typical AI course might include algorithms and strategies based on discrete probability, others based on highly-architected structural representations of knowledge, and still others based on self-rewriting code. At a lower level (high school), you might consider a biology teacher, who gets to cover cellular biology, taxonomy and cladistics, genetics, anatomy and physiology, and possibly some other things I'm forgetting right now.

Between courses, teaching is heterogeneous because you're teaching a lot of different things. Even if it's all "math", for instance, there's a fair amount of variation between algebra, geometry, logarithms and function analysis, differential calculus, vector calculus, trigonometry, and probability and statistics, and that's just among the courses often taught at the high school level.

And all of that is only talking about the things taught, which speaks directly to the question of evaluation. But on the subject of heterogeneity, classroom dynamics can vary significantly just between semesters at the same school, not to mention between different schools, and within a single classroom you have students with assorted learning disabilities (documented or not) and simply different learning styles—some are more verbal, some need to see it done, some need to do it on their own first, some benefit more from working together, some really need to try it first and crash and then hear the way they were supposed to do it and try again. It's a lot more than just "a few extra geniuses or dunces".

Teaching is pretty damn heterogeneous. The best teachers I know are, and have to be, among the most mentally-agile people I know.


You are completely missing the point, so let me repeat:

"It makes sense to compare # of students who can integrate in 2012 to # of students who can integrate in 2011. It's a noisy measurement, but it works. How do you compare search engines to quantum mechanics simulations?"


It's on point because "number of students who can integrate in 2012 vs 2011" is rather more like "percent of regression tests passed in 2012 projects vs 2011 projects" than it is like "quality of (?) search engine code vs quantum simulation code", or better, "number of successfully completed customer tasks using search engine product vs quantum simulation product". That is it measures something that is not irrelevant, but awfully specific, and possibly less indicative of the larger whole than you might think. Because teaching is pretty heterogeneous. And the measurement is extremely indirect, i.e. how good another person is at something after interacting with the programmer/teacher's product.


I love standardized tests, they are easy. Really easy. Anyone who wants to ace them, can and will, many without studying. Are they helpful? Who knows, most likely not at all.


> Anyone who wants to ace them, can and will

This is an amazingly provincial comment and it does not reflect well upon you.

Rather than assuming malice in your comment, however, I would gently suggest that it would be rather difficult for someone passed from grade to grade without basic functional literacy to ace such a test. Or, for a sneakier, harder to quantify example, there are "standardized" tests in circulation (and I was exposed to some of them in middle and high school) where certain cultural information was implicitly demanded. Probably not a problem for middle-class suburban kids, but unlikely to be appropriate for underprivileged very-rural kids (of which the testing area had many).

A narrow worldview isn't something to be ashamed of, but it is something to recognize and compensate for when making statements like yours.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: