> Seems like it could be within a margin of error (3 v 5%). There is no margin o...

chewbacha · on Oct 6, 2022

There is if you are missing more than 10% and trying to make extrapolations. Also, there is variations between individuals between tests. You’d need to make multiple samples to reduce noise from the test itself.

bloaf · on Oct 6, 2022

So if you're worried about it, do the math. My napkin calc puts the margin at around +-0.3, enough to easily separate 3% from 5%.

chewbacha · on Oct 6, 2022

So, you expect the exact same results next year? If the same students all retook the SAT with different questions would we get the exact same breakdown +/-0.3%? I doubt it. These data are for this specific measurement and must be tempered in their extrapolation into other areas. I don’t think you can declare them significant.

bloaf · on Oct 6, 2022

You must not believe any statistics if you think sampling 85% of a 110k population can't resolve a score like this to 1%. Your original question, which I have been addressing, was about generalization to the population (i.e. what the margin of error quantifies) not nebulous "other areas."

I would expect a re-test to have slightly higher averages overall because all the kids now have experience with the class of questions, whereas only some did the 1st time.

chewbacha · on Oct 6, 2022

Not at all, I trust statistics a great deal. I don't have statistics on the reliability of the SAT nor how reproducible it is, so I have no way to know based on the numbers presented how noisy the distributions ought to be. You can imagine that some band of test-takers is on the bubble and can fall above or below the "top-tier" line on any given day. You are asserting that this is less than 1% of the population, I'd argue it's likely higher than that. But I don't believe I have the data to present either way.

To address the population; if the "population" you are referring to is only the seniors in Michigan on _that_ day then it's might be pretty accurate. Looks like the population of Michigan is around 10 million (20% of that is under 18), meaning that this would be extrapolated much more to address just the population of Michigan, not to mention what it means if we apply it to the entirety of the united states. Any population that we apply this to that isn't the sample audience at that specific time will incur some error.

At each of these higher levels of population abstractions we have orders of magnitude higher error margins, which would quickly dwarf the ppt difference between males and females in a single test.