The scoring practices employed by these folks are abominable, but the refomies still want to use the results to make hiring and firing decisions. And that all but guarantees churn in the teacher corps. Why?he writing portion of this year's FCAT plummeted so precipitously that the abilities of Florida's student writers aren't even being called into question. The validity of the scoring statistics are. While I don't want to say "I told you so" regarding the dubiousness of those statistics, I did tell you so, as my 2009 book highlighted in detail all the ways the numbers produced by the for-profit standardized testing industry cannot be trusted.Take the stats produced at Pearson scoring centers around the country, where I worked for the better part of 15 years. On the first project I worked scoring student essays, I had to pass a qualifying exam to stay on the job. When I failed that qualifying exam (twice), I was unceremoniously fired. So were half the original hundred scorers who had also failed the tests. Of course, when Pearson realized the next morning they no longer had enough scorers to complete the project on time, they simply lowered the "passing" grade on the qualifying test and put us flunkies right back on the job.Yes, those of us considered unable to score student essays 12 hours before were welcomed back into the scoring center with open arms, deemed qualified after all.
If any state's DOE wants to see a certain percentage of teachers fired, they can just ask Pearson to make sure a commensurate number of students don't pass the tests.Once I attended a range-finding meeting with other test-scoring experts and English professors from around the country, the bunch of us trying to figure out how to score writing samples for a national test. After that group of experienced test scorers and esteemed writing teachers had hammered out some consensus regarding the writing rubric and writing samples we'd been reviewing, we were told we were scoring "wrong." We test-scoring experts and writing teachers were told our scoring wasn't matching the predictions of the omniscient psychometricians (statisticians/testing gurus), and we were told we had to match those predictions even though the pyschometricians had never actually seen the student responses.When the next year I read in the New York Times that student writing scores had ended up exactly in the middle of the psychometricians' predictions, I can't say I was surprised: We had made sure they did.And that's the thing: In my experience, the for-profit test-scoring industry could produce results on demand. There was no statistic that couldn't be doctored, no number that couldn't be fudged, no figure that couldn't be bent to our collective will. Once, when a state Department of Education (it wasn't Florida's) didn't like the distribution of essay scores we'd been producing over the first two weeks of a project, we simply followed its instruction to give more upper level scores. "More 3's!" became our battle cry on that project, even if randomly giving more 3's was fundamentally unfair to all the students whose essays had been assessed differently in the days before.In the end, I guess I'm saying you probably needn't worry too much about this year's falling FCAT scores, because they're only a number. If you want a different number next year, just ask; surely Pearson will just make more. [emphasis mine]
I am convinced that Pineapplegate was the direct result of publishing NYC's teacher ratings; once high-stakes decisions are made based on tests, the tests will come under new scrutiny. And when the poor design, administration, and scoring of these tests comes to light, the lawsuits will begin.
It won't be pretty.
UPDATE: Dumb typo fixed. I hate being my own editor...