NCLB tried to fix those "ineffectual, corrupt money pits" by giving enormous, ov...

gwern · on July 14, 2014

A corrupt incompetent cronyistic mess is going to be a mess whether or not you use a small set of gamed statistics or nothing at all.

The reporter doesn't get it when they critically remark that the reform model "ignored less quantifiable signs of intellectual development" - no, what would happen if it didn't is just that the people who are willing to make up test scores en masse will also make up all those 'less quantifiable signs' except you won't have any way of knowing about their bullshit.

The NCLB gaming of standardized tests, on the other hand, forces the incompetence out into the open in the form of unmistakable undefensible clear outright fraud.

So, which is better? To have all your metrics gamed and to not know it, or have them gamed and a chance of detecting the fraud?

dragonwriter · on July 14, 2014

> The NCLB gaming of standardized tests, on the other hand, forces the incompetence out into the open in the form of unmistakable undefensible clear outright fraud.

The outright fraud isn't the only thing it forces (and, really, its sort of the extreme fringe); the more routine thing it forces is reluctance of schools to promote students based on academic ability since holding advanced students back in grade level improves metrics of students that are function at or above grade level.

(Of course, its an unsustainable optimization, which also increases the rate at which advanced students with involved parents defect from the public school system entirely, but that's an effect that takes longer to materialize than the short-term effect on metrics.)

yummyfajitas · on July 14, 2014

Please be concrete. How do you game standardized tests? Sample tests can be found here, so make sure your proposed method would actually work.

https://www.georgiastandards.org/resources/Pages/Tools/Testi...

dalke · on July 14, 2014

I'll be concrete and define "game standardized tests" as "aspects which influence the test results other than proficiency in the subject matter."

On the teacher side, one way to game them is via test prep. Quoting from http://curmudgucation.blogspot.com/2014/04/what-test-prep-is... : "learning how to perform the specific cockamamie tasks favored by the designers of the various state-level assessments"

That includes:

> We have covered "How To Spot the Fake Answers Put There To Fool You." We've discussed "Questions About Context Clues Mean You Must Ignore What You Think You Know." We've discussed how open-ended questions require counting skills (the answer to any question that includes "Give three reasons that..." just requires a full three reasons of anything at all, but give three). For lower-function students, we covered such basics as "Read All Four Answers Before You Pick One."

> We have pushed aside old literary forms like "short stories" and "novels" in favor of "reading selections"-- one-page-sized chunks of boring contextless pablum which nobody reads in real life, but everybody reads on standardized tests. We have taught them to always use big words like "plethora" on their essay answers, and to always fill up the whole essay page, no matter what repetitive gibberish is requires. We have taught them to always rewrite the prompt as their topic sentence. In PA, we have taught them what sort of crazy possible meaning the test-writers might have assigned to the words "tone" and "mood."

This is gaming the system because had the test givers used a completely different approach, say, of using open-ended questions instead of multiple-choice answers, or "give persuasive reasons that..." instead of the more easily tested "three", then this test prep would not work. Not that that alternate testing form will happen in standardized tests, because it's a lot more expensive to grade free-form tests. But teachers themselves can ask these sorts of questions, to help gauge proficiency.

Another way to game the test is on the test giver's side - they get to define what the pass/fail thresholds are. For example, "a test question was considered "hard" or "easy" not because it required a particular skill; its difficulty was determined based on how many students got it correct." Quote from http://jerseyjazzman.blogspot.com/2013/08/scoring-ny-tests-w... .

Since the thresholds can be determined after the fact, this means it's not really judging proficiency but instead is making the test give a pre-defined pass/fail curve.

teddyh · on July 14, 2014

See also the “Cluss Test”¹; a gibberish test which it is nevertheless possible to get 100% right using nothing but “tricks”. For those wanting to try their skill, I have made an interactive version here: http://www.recompile.se/~teddy/cluss-test

1) http://www.tmk.com/ftp/humor/quiz

Torgo · on July 14, 2014

I worked for eight years in standardized testing. I ran systems that facilitated operation of a number of different standardized tests for government and commercial clients. One of the tests was a certification used by states for a particular cosmetic procedure. The test was written in Spanish and English. A significant number of people who took the test did not speak other of these languages, usually Vietnamese. This wasn't a problem because the test key had obvious patterns in it to facilitate passing the test. You simply told another person what the pattern was, and anybody could pass it without being able to read a word of English or Spanish. It was obviously set up this way on purpose, although no other test I worked on was like this.

I learned a few things about standardized tests after working on them for so long: 1. almost none of the criticisms of standardized tests apply to a well-designed and maintained test, in particular "susceptible to gaming" or "culturally biased against minorities"; 2. many tests are badly designed; many tests function only as professional barriers to entry and governments/certification bodies don't care all that much how unfair the tests are as long as the final numbers come out right.

I worked on two tests for the federal government. They both were done with great care to be as fair as possible, although I learned how they manipulated the process to get the demographic mixture they desired.

yummyfajitas · on July 14, 2014

"How to spot the fake answers put there to fool you" == "how to see when an answer isn't even in the ballpark". That's a useful skill. That "context clues" thing suggests teaching students how to solve the problem in front of them, not the easy problem their mind wants to substitute for it [1].

From what this guy describes, "test prep" sounds like "educating students".

I agree that grading of essays is a disaster. It's not specific to standardized tests, however - that's how all my essays were graded from grade 1 all the way to college.

Another way to game the test is on the test giver's side - they get to define what the pass/fail thresholds are.

This is why tests are standardized, not left up to the schools or teachers.

[1] A hard problem: "What is the optimal incarceration time to dissuade people from pedophilia." An easy problem: "How angry do pedophiles make me feel?" When most people hear the first question, which is hard, their mind substitutes the second much easier question for it. See Kahneman's book "Thinking Fast and Slow" for more on this. http://www.amazon.com/gp/product/0374533555/ref=as_li_tl?ie=...

dalke · on July 14, 2014

I defined what it meant to game a test, and gave definitions which fit the example. You have rejected my definition, without giving an alternative.

"That's a useful skill" is not a useful educational criterion. Knowing how to change a tire on a car is a useful skill, but it's not appropriate for a math course.

"Educating students" is also a useless criterion. Education is a never-ending process. I'm still learning things now. Schools by necessity must restrict themselves to certain topics. A Spanish teacher cannot simply use "I'm educating students" as an excuse to spend four weeks on Canadian politics in the 1970s.

If test prep is so important, why isn't it its own course, where the teachers are trained for it, and where there are specific curriculum goals?

"This is why tests are standardized, not left up to the schools or teachers" - are you willfully ignoring the point? Someone defines the standards. The page I linked to suggests that the standards for this New York test were defined by people who want the public school systems to fail, as part of the general effort to privatize public school.

As http://www.washingtonpost.com/blogs/answer-sheet/wp/2013/08/... points out:

> The bottom line is that there are tremendous financial interests driving the agenda about our schools — from test makers, to publishers, to data management corporations — all making tremendous profits from the chaotic change. When the scores drop, they prosper. When the tests change, they prosper. When schools scramble to buy materials to raise scores, they prosper. There are curriculum developers earning millions to created scripted lessons to turn teachers into deliverers of modules in alignment with the Common Core (or to replace teachers with computer software carefully designed for such alignment). This is all to be enforced by their principals, who must attend “calibration events” run by “network teams.”

You even used the passive "that's why tests are standardized" - who standardized the tests, what political and financial goals influence them, and how transparent is the standardization process?

When the schools and teachers define the tests, then these issues are much clearer, and any failures are limited to just the school or teacher, and not systemic to the entire state.

Hence why it's possible, and likely, that some test standardizers have gamed the test results, under my concrete definition of "gaming."

yummyfajitas · on July 14, 2014

I agree that if you want to define education in the course material as "gaming", the tests can indeed be gamed. Knowing how to reject an obviously false answer (e.g., "23.6 x 10.9 = ? a) 1,000,000, b) 0.000000001, c) 257.24 d) 527.24) is part of learning math.

As for who defines the standards, the answer is our politicians or whoever they delegate to. And if you define gaming as "defining a standard", then yes, the test creators also game the system.

So far, you haven't actually pointed to any part of the standard that you object to. Nor have you pointed out any sort of gaming other than "teaching the material on the tests, including how to ballpark answers".

dalke · on July 14, 2014

I defined gaming as "aspects which influence the test results other than proficiency in the subject matter."

I did not define it as "education in the course material". Please don't make that assumption.

It's impossible to evaluate your example without defining the pedagogical goal. Your example test question cannot distinguish between proficiency in multiplying two three-digit numbers, and proficiency in selecting from one of four possible answers, where two are obviously incorrect.

That said, this question is biased in favor of students who have been taught estimation techniques, in this case, round, compute 20 * 10, and look for the closest answer. They will be able to answer more of these types of questions than students who can actually multiply the numbers, but haven't learned the approximation methods.

Had the answers been "1) 257.24, 2) 256.24, 3) 247.34, 4) 248.34" then the other class of students would fare better. Then again, those who learned casting-out-nines would be able to reject two of these quickly.

It's clear that sometimes ballpark answers are better than exact ones. In bookkeeping, it's clear that exact answers are better. It's possible to teach students both ... by taking time away from other skills which are also part of mathematical proficiency. A standardized testing system encourages monoculture teaching, so that all students are primarily taught the method most likely to be on the test, on the assumption that the test defines proficiency.

It appears that you have defined "proficiency in the subject matter" as "ability to pass a standardized test." If so, then by definition it's impossible to game the system, making this discussion pointless. Is that your definition of proficiency?

dalke · on July 14, 2014

P.S. Here's another way to game the test system - expel your worst students before the state tests. In that way, your school gets the money (for the student) but doesn't have to be responsible for the poor grades, or even make an effort to educate them. See http://www.researchonreforms.org/html/documents/DumpingKidsO... for examples.

zb · on July 14, 2014

Ironically, standardised testing is a prime example of a system choosing to solve an easy problem instead of the problem it has.

Hard question: are our schools meeting the educational needs of their pupils and society?

Easy question: did enough pupils fill in enough of the right circles on this test for us all to not get fired next year?

If you exchanged test scores for production figures and teachers for farmers, this article could be a story directly out of Soviet agriculture in the collectivisation era. I'm finding it highly entertaining to watch you, of all people, defending this system so vehemently.

sp332 · on July 14, 2014

If you don't have standardized testing, how do you know if the students are being educated? More specifically, how do you identify schools that need help and focus resources effectively?

amputect · on July 14, 2014

The problem is that standardized testing doesn't tell you if students are being educated, it tells you if students are passing standardized tests, which is a different problem that, at best, only partially overlaps with the actual question. If you over-optimize for standardized testing, you end up under-optimizing for actual education.

I had to take the WASL as part of my graduation from high school, back in 2003. Forcing this requirement on schools did not improve our education, it degraded it. Instead of actually reading two or three good books in our english class, we studied cliff notes of 15 or so works, so we could write shallow summaries of "key themes" for the test, for whichever books actually appeared on it. Instead of going forward with trigonometry in math, we went over estimation and hammered on very basic geometry problems. My government class stopped covering its subject matter entirely, and we studied analogies and reading comprehension, because government wasn't on the WASL (at the time; it might be now). The net effect was that as a class we did really well on that test, but our actual education suffered. That's what standardized testing gets you.

sp332 · on July 14, 2014

The problem is that standardized testing doesn't tell you if students are being educated

But what will?

dalke · on July 14, 2014

I program for a living, but I've not taken any programming tests since I was in school. How do people know that I'm doing a good job without my taking a standardized test?

The answer is one we've had for a long time - talk with the teachers. Teachers are professionals, paid to educated children and evaluate what needs to be improved. We developed normal schools to train people how to be teachers. These became known as teachers' colleges, and then became education programs in a university.

In addition to continuing education programs and peer development, we also have oversight programs in place, including the department head, principal, and local school board. Among other things, these are supposed to help identify teaching problems and remedy them.

Unfortunately, management is both support and punishment, which can lead to power imbalance where a school board member says "My nephew must be on the football team or else you won't get a raise next year!" One way to limit this power imbalance is to set up a teacher union or tenure system. Another is for additional community oversight, which may include the parent-teacher organizations like the PTA.

Therefore, your question sounds like you trust the authors of standardized tests (who are often in for-profit companies that sell the tests, sell standards, and sell text books which match the standards) more than you trust teachers or the professional education system.

Why is that, do you think?

yummyfajitas · on July 14, 2014

How do people know that I'm doing a good job without my taking a standardized test?

Measure your outputs, ideally with unit tests, manual testing and the like. Does your code do what it's supposed to?

Standardized tests are basically unit tests for teachers - they measure whether the teacher's outputs are capable of reading and writing.

...talk with the teachers.

http://en.wikipedia.org/wiki/Principal%E2%80%93agent_problem

Note that the test manufacturer has no such problem.

dalke · on July 14, 2014

There are a couple of other branches in this thread waiting for your followup for the last few hours, and you pick this one? I still want to know if you think that "proficiency in the subject matter" is defined as "ability to pass a standardized test."

You think neither I nor the entire education system over the last 150 years have ever considered the effect of the principal agent problem? I even said "we also have oversight programs in place, including the department head, principal, and local school board." I elsewhere also pointed out to you how test manufacturers stand to make a profit if they can convince people to buy their tests, curriculum, and text books. They most certainly have a bias.

Your comparison to unit tests is telling, in ways you didn't mean it to be. Every project I've worked on has a very different set of unit tests, with essentially nothing shared between the different test cases outside some common test infrastructure.

Even multiple people on the same project end up writing different sorts of unit tests for the same code base. I do more functional and coverage driven tests, a co-worker is a red-green-refactor TDD developer. This diversity of tests is probably better for the overall code base than if we all did the same thing.

You do realize that teachers almost certainly have studied assessment design as part of their coursework, while most developers have almost no formal training in test engineering or experience in, say, coverage analysis?

If the goal is to test the students, then the teacher can - like the developer with good test engineering skills - develop the appropriate tests for the given set of students and expected knowledge. Except the teacher's tests must also be engaging and authentic, while the computer doesn't care what it runs.

And yet you think that one single set of unit tests for, say, all 8th grade English teachers can be useful enough to judge a specific student's progress, or a specific teacher's skills? Where does that optimism of yours come from?

sp332 · on July 14, 2014

Honestly, I have no idea if you're doing a good job. At least (as far as I know) you're not asking for federal funding for your job.

Well, I don't have to trust the authors since I can see and evaluate the test for myself. I can't really evaluate every teacher and analyze the pressures on them.

dalke · on July 14, 2014

Education is primarily state funded, not federal. I believe federal funding is only 10% of the local school budget, and includes meal assistance and other things which aren't directly tied to a teaching position.

In any case, your original question is also valid for private schools - how do the parents of private school students know if their children are being educated or if the schools need help? How does the bishop overseeing several Catholic schools do the same?

Therefore, why is "federal funding" relevant to the topic?

As I understand it, you don't have access to the questions and answers for the high stakes tests, so you can't evaluate them. I can be proven wrong. Can you show me the complete set of questions for a state test from last spring? I looked for Florida, and only found FCAT tests from 2005/2006 at http://fcat.fldoe.org/fcatrelease.asp . I could not find FCAT 2 questions from 2013 or 2014, though I did find the scores from http://fcat.fldoe.org/fcat2/ .

This is what I expected, because some of the questions are potential questions for future tests, and exist to calibrate the tests. If the questions and answers are published, then they can't be used that way.

Which suggests that you don't know what you're talking about, as regards high stakes testing, or that there are some states where all of the tests are published, so that people like you can review them. Which tests are you thinking of?

If I understand you correctly, you are satisfied if you can "see and evaluate the test." Wouldn't you be similarly satisfied if you could "see and evaluate" all of the tests from each teacher at every school? Since that seems a lot cheaper and easier to do than set up high-stakes testing across the country.

sp332 · on July 14, 2014

You're right about most of the funding but NCLB has provisions to redistribute federal money to specific schools. https://en.wikipedia.org/wiki/No_Child_Left_Behind_Act#Fundi... And while you don't get specific questions, you can see example tests to see what subjects are covered, how much is multiple choice vs essay, etc.

dalke · on July 14, 2014

Yes, the feds contribute some of the money in exchange for a lot of the rules. That doesn't change anything of what I said - your original question is independent of federal involvement and could equally apply to privately owned Catholic schools.

Have you changed your viewpoint? You previously said "I don't have to trust the authors since I can see and evaluate the test for myself." Now you're okay with seeing only a synopsis of what's in the tests? Why do you still trust the authors if you can't see the actual test?

If you could get the same synopsis of the questions that the teachers ask, then wouldn't you also be satisfied? Why not?

sp332 · on July 15, 2014

I'm actually kind of disappointed that there's not more information available. It still seems better than what we got before, which was even less informative. And it would be niceto have that synopsis, but it's much more useful to have a standard so we can compare across schools.

dalke · on July 15, 2014

I'm completely bewildered. I said:

> your question sounds like you trust the authors of standardized tests (who are often in for-profit companies that sell the tests, sell standards, and sell text books which match the standards) more than you trust teachers or the professional education system.

> Why is that, do you think?

You answered that it's because you could see the test questions, and evaluate them for yourself. Then you said it's because you could see samples of the questions. Now you say it's because you can compare scores?

Curriculum standards have been around since the 1800s, so is "have a standard" short for "have standardized tests"? Actually, we've had those for decades - my birth state of Florida started them in the 1970s, so I assume you mean "have high stakes standardized tests"? Actually, Florida also introduced the nation's first required high school graduation test in 1977", so you must mean "frequent high stakes standardized tests", yes?

How is this more useful than earlier assessment tests, as well as GPA, SAT scores, ACT scores, graduation percentages, number of students going on to the Ivy League/Big 10/whatever, number of National Merit (semi)finalists, number of available AP/IB courses, average AP score results for a given field, lists of extracurricular activities, football team scores, and a lot of other cross-school comparison metrics?

Again I ask, what is the basis of your trust of the authors of a standardized test over the teachers and the professional education system?

zb · on July 14, 2014

Nobody said anything about not having standardised testing. The problem here is high-stakes standardised testing.

You can know stuff about a system that you can't measure completely (i.e. any system) by sampling it - you'll get a lot of noise and even some systematic biases, but as long as you maintain an awareness of that you will know something.

The problem comes when you want to control things. If you create a feedback loop by attaching strong incentives to the measures you are using to acquire knowledge, then you end up with neither control nor knowledge. You're no longer taking a representative sample, just measuring the gain of your feedback loop.

And if you apply the incentives at a granularity of measurement such that noise overwhelms the signal (by a factor somewhere between 7 and 100, according to the article)...

Liesmith · on July 14, 2014

>"How to spot the fake answers put there to fool you" == "how to see when an answer isn't even in the ballpark". That's a useful skill. That "context clues" thing suggests teaching students how to solve the problem in front of them, not the easy problem their mind wants to substitute for it[1]. From what this guy describes, "test prep" sounds like "educating students".

OK, sure, it does sound like educating students, IF you accept that teaching people to spot inconsistencies and trap questions in a ritualized multiple choice test is a skill that transfers to other situations. Unfortunately this is really not the case at all. It's an education in how to navigate specific public-school bureaucracy. It doesn't even teach people to navigate other shitty bureaucracies.

rando289 · on July 14, 2014

> From what this guy describes, "test prep" sounds like "educating students".

It's educating them in the wrong things. For example, goal of class: better thinker by learning to do multiplication. Because of test, they only learned memorizing multiplication tables.

yummyfajitas · on July 14, 2014

The tests ask for multiplying 3 digit numbers. That's a lot of memorizing.

octo_t · on July 14, 2014

that was a simple example. A better example might be memorizing quotes from a book to regurgitate in the test*

* (I had to do this as part of GCSE English in the UK - I expect a similar situation in the US)

yummyfajitas · on July 14, 2014

Why don't you look at the real tests and see if such tricks are possible?

6d0debc071 · on July 14, 2014

> "How to spot the fake answers put there to fool you" == "how to see when an answer isn't even in the ballpark". That's a useful skill.

It's a probabilistic skill, and in so far as the test is designed to give a measure of what someone knows, and not how lucky they've been, it's gaming the system.

Is it a useful skill? Well, yes. To an extent. So is knowing your addition table, but we expect students to have moved somewhat beyond that by the time they're in secondary education. Just as we'd expect someone educated for five years, six or seven hours a week, 28-40 weeks a year, to have advanced somewhat beyond the need for discarding comedy answers as a viable test strategy.

By secondary level, we'd expect them to know, (or be capable of running the calculation,) to decide among those answers that are actually in the ball park. Approaches for which the cost is more or less constant regardless of how many answers are on the page: You trust your calculation or memory to have given you the answer and discard all others by default.

sophacles · on July 14, 2014

The thing you are blatantly ignoring is this: standardized tests also teach children the lesson that there will be exactly one correct answer in all of life's situations, and that there will be exactly one OK method for every possible challenge faced. No exceptions.

Which is sometimes true sure. But it is frustrating working with academic "stars" who internalize this once they come to the "real world". They are focused so much on getting the "right answer" within some narrowly defined context of right, that they can't see the better solution by reinterpreting the problem or recasting the assumptions in a slightly different order. They are unable to combine bits of knowledge from different buckets, because the test questions are all neatly siloed.

For example, I've had this argument with fresh grads many, many times:

me: you need to limit your UDP packets to 512 bytes (or 8K depending on the situation).

them: but my teachers told me UDP packet size is a 16 bit integer.

me: yeah, but many stacks cut off shorter, because there is a different standard that says routers can drop packets bigger than their preferred size, the only minimum is 512 bytes.

them: my teacher told me that the packet size is a 16 bit field. Why are you talking about routers?

me: because you need to combine information to actually solve a problem?

them: whatever, I need to figure out what the bug in my code is causing these packets to be dropped.

Or -

me: hey $intern, let's figure out a few ways to solve approach this problem. expounds on the problem, lays out a few things that might work. The goal here is to try a few different techniques so we can work them into the bigger design. Any questions?

intern: no.

a few days later

intern: Hey I think i solved the problem, is this one solution right?

me: it's one way. It has some good stuff and bad stuff, but we want to try a few solutions to determine how to think about this.

intern: looks like a lost puppy but is it right?

Following conversation about multiple solutions and exploring solutions resembles "who's on first"

The biggest problem with standardized testing is there is no room for the idea that outside of school, it isn't always about doing the rote thing, the simple siloed task in front of you, but rather incorporating various bits of knowledge, about applying the bits of knowledge in ways that allow task completion for tasks that aren't extremely well defined with a pre-arranged solution.

In fact - the lack of a pre arranged solution is what defines most work outside of menial jobs. The idea that there is more than one approach or solution to something is antithetical to the core of standardized testing.

(keep in mind - that for the statistics to be meaningful, the tests can't allow for grading criterial other than "one strictly correct answer" or you end up with issues in the numbers as the result of graders being different.)

wisty · on July 14, 2014

There's pretty strong evidence suggesting this isn't a problem. Or if it's a problem, it's not one that schools can solve.

Teaching "critical thinking" is basically a waste of time. You can't do it. It would be nice if you could, but you can't. "Critical thinking" simply doesn't transfer. (Well, they do a tiny bit, if they are done right, but there's more fine print than Facebook's ToS to any claim that you can teach students how to think.)

Let's say you took all those "creative thinking" skills you learnt in networking, did a course on photography, then got a job with a really good photographer. Guess what - you might have decent communication skills, but you'd still come off as a clueless idiot who can't "think creatively" or "solve problems", because you don't have the domain skills and knowledge.

If they've got a solid core of domain skills and knowledge, they can actually think for themselves. If they don't, they'll be clueless, and just try to memorise answers.

Anyone who can tie their own shoelaces knows "there's more than one way to solve a problem". Kids can actually think for themselves, if and only if they understand the domain.

Now, maybe the schools are teaching really badly, and the tests are geared towards forcing students to answer questions rather than solve problem - that's a problem. As in machine learning, getting students to memorise training data just leads to brittle learning. That might be the real problem - the blind are leading the blind, and some teacher who can't network is telling kids to memorise whatever was in the book, because no-one in the class has a clue. That's a recipe for incompetence.

And we know that high stakes tests with rewards for "good" teachers are like paying programmers per LoC. But that's not a problem with standardised tests anymore than code metrics are a problem. Idiots in management can cause issues, though.

sophacles · on July 14, 2014

I disagree completely with "you must already have domain knowledge to be able to apply basic learning skills within that domain". I've seen people enter new domains and do well, and other enter new domains and do poorly. The difference seems to be the ability to ask "how do the things I do already know interrelate?"

It is a matter of metacognition (thinking about what I know and how it applies) and not being paralyzed by fear of "getting the wrong answer". The former can be taught, and there are teaching methods that show success around the concept. The latter is something that is hard to overcome when people spend 16 formative years being punished when they don't "find the exact, single, and exclusive" answer and not being rewarded for "learning a few ways". (although research also shows that tests that are not binary - that is all points or no points - do a good job of helping with the fear e.g. multiple choice tests that have "wrong" answers that suggest conceptual understanding even if there is a calculation error.)

maxerickson · on July 14, 2014

I don't think standardized testing has that much impact on how people think. I suppose test prep teaching is less likely to break people out of lazy thinking, but I don't think it inculcates it.

sp332 · on July 14, 2014

Are you blaming the test makers here?

JoachimSchipper · on July 14, 2014

Most basically, a school that frequently teaches useful-but-not-tested material will, all else being equal, produce lower test scores than a school that sticks rigidly to the tested material.

I'm not sure this is "gaming", proper; but if this focus on the tested material prevents a teacher from enthusiastically expounding on something (s)he thinks is especially fascinating, I'd count that as a loss. Because having enthusiastic teachers is great.

(Of course, standardized testing has benefits, too - I'm not sufficiently informed on American education to have an overall opinion. I just wanted to point out that there are ways to test better that don't improve education.)

yummyfajitas · on July 14, 2014

If a teacher enthusiastically expounds upon football, string theory or creationism, and fails to teach reading comprehension, they will indeed suffer on the tests.

This doesn't seem like a bad thing. Deciding what needs to be taught is the job of the political system, not the teachers.

zarify · on July 14, 2014

Ignoring the obvious hyperbole of your first sentence, sure politics essentially sets the curriculum (for better or for worse). That doesn't excuse the political system also getting in the way of teaching it.

Mercifully in Australia we don't have standardized testing every year (yet), but enough importance is tied to the tests that do take place that a disproportionate amount of time is spent preparing kids for the style of tests administered, regardless of any gaming of the system, reducing the time spent on the rest of the curriculum.

I know we've argued education before and so I'm not expecting a hallelujah moment, but I figured it was a point that should also be made.

GabrielF00 · on July 14, 2014

Policymakers decided that kids should be taught a variety of subjects - state graduation requirements include science, social studies, arts and languages. However, standardized testing focuses on math and language arts. So you have the phenomenon of art teachers or history teachers being ordered to teach math or reading in order to boost test scores.

sp332 · on July 14, 2014

The math and reading requirements are hardly unreachable. If history teachers are trying to teach history to students who can hardly read, they're not going to get very far. They should push back against the language arts teachers to do their jobs.

yummyfajitas · on July 14, 2014

If we put history/social studies/etc on the test, I take it you'll then withdraw your objection?

tptacek · on July 14, 2014

Probably not (I'm not opposed to standardized tests, FWIW). Schools need to teach art in art class, history in history class, &c. If lying about curricula to squeeze in more test prep doesn't count as "cheating", it's hard to imagine what could.

awj · on July 14, 2014

...and restricting what is taught to what can be measured is a bad idea. As is putting so much emphasis on your measurements (and little enough work into creating them) that teachers have an incentive to teach the content on the test rather than the full breadth of the subject.

Most of my math classes growing up ended with a week or two of "here's how what you just learned applies in the real world / future classes". It was one of the few redeeming aspects of my early math education, and is probably one of the first things to get cut when schools want to beef up test scores.

octo_t · on July 14, 2014

If a maths teacher wants to teach category theory, or algorithmic analysis (things which can be both interesting and relevant to a someone growing up), they can't.

Is that a better use case for you?

yummyfajitas · on July 14, 2014

The state legislature and federal government have decided that arithmetic is more useful than category theory. I agree with them completely, and I'm one of those rare people who is extremely sympathetic to category theory.

So yes, I think tests are doing the right thing here - making sure the teacher does his job before he goes off on random tangents.

rbehrends · on July 14, 2014

By necessity, any test can realistically test only a subset of a student's knowledge. If the patterns that the tested subset follows are easily predictable or have a simpler structure than the actual knowledge, you can get better performance by teaching those patterns than by teaching the actual skills.

Multiple choice tests are particularly egregious examples, as for them you only have to be able to verify an answer instead of having to derive the answer.

Standardized tests are rife with multiple choice tests and questions that follow predictable patterns.

danielweber · on July 14, 2014

Good test design is hard, especially for an amateur. But it seems the remedy ought to be "have better tests" instead of "don't test."

There are good tests out there; the AP tests have a significant multiple-choice component that resists simple trickery.

rbehrends · on July 14, 2014

> But it seems the remedy ought to be "have better tests" instead of "don't test."

Nobody is saying "don't test". The only place that I know of that doesn't test in the traditional sense are Waldorf schools, and even they still evaluate their students.

Concerns are being raised about standardized testing in a NCLB-influenced environment, both in how that limits test design, tests for the wrong things, and how this kind of testing creates the wrong incentives.

> the AP tests have a significant multiple-choice component that resists simple trickery.

It isn't just about trickery. I remember encountering a pumping lemma question during the GRE that was simply easier to answer in multiple choice form because I didn't have to prove the correctness of the right answer, but only the incorrectness of the wrong ones (and even there I didn't have to work out the fiddly details, I just needed a modest amount of confidence).

cafard · on July 14, 2014

You can cut the Gordian knot by simply having the teachers correct enough errors on enough tests--that has been done in the Washington, DC, area. Or you can get a sneak preview of the test and drill the kids on that for a few days.

samdk · on July 14, 2014

Did you read the article? The entire thing is about a school that for years did things like have teachers change test answers after the fact in order to increase their overall test scores. That's what I mean by gaming statistics.

edit: clearly I should have chosen a different word. I think you're all focusing on one specific word I used, and not on the actual argument I'm making, which is that if your model of assessment depends on a single Big Important Number, people will find ways to make that number what they need it to be.

rmc · on July 14, 2014

That's outright cheating and fraud. Any test can be cheated if you look at the answers beforehand. "Gaming" means figuring out flaws in the way it's tested and then doing lots of that. In your lines-of-code-added metric, "gaming" means adding and removing lots of code. "cheating" means hacking the database with results and changing the values.

yummyfajitas · on July 14, 2014

If "game" simply means "cheat", the solution is simple - third party test administration. I thought you meant they do something legal other than education to make the scores go up.

twoodfin · on July 14, 2014

That's not gaming; it's cheating. No objective evaluation system could be immune to that.

jfoutz · on July 14, 2014

Arrange to get a few minutes alone with the tests, correct a few answers on your worst performing students tests, or just swap out their answer sheets with improved answers.

Freakanomics talked about this in regard to Chicago public schools merit pay system.

frozenport · on July 14, 2014

>>easily gamed statistics.

Nope. If they were easily gamed Park would have survived? The failed at teaching to the test and had the cheat the old fashioned way.

danielweber · on July 14, 2014

Managing by numbers is bad. I once had a boss look over the commit count to the source code repo to measure effectiveness.

However, software bosses have another avenue: they can just hire and fire based on how well they think people are performing. They don't need to set up spreadsheets or anything to "prove" who isn't doing their job. And the engineer who has been told to move on can move on, because there isn't just one employer in town, even outside of the top-5 cities. And the manager has a reason to care about properly evaluating his reports.

This attempt to measure educational performance by numbers, for all that it is screwed up, has risen as a desperate attempt for the people paying the bills to feel like they have some kind of accountability into the system they are paying for.