Kudos to the Gates Foundation, seriously: after spending $775m on the Intensive Partnerships for Effective Teaching, a Big Data initiative to improve education for poor and disadvantaged students, they hired outside auditors to evaluate the program’s effectiveness, and published that report, even though it shows that the approach did no good on balance and arguably caused real harms to teachers and students.
Cathy “Weapons of Math Destruction” O’Neil has given the report a close reading, and she found that the problems with the approach were pretty predictable: asking principals to rate teachers produces pretty uniform and meaningless five-star results, while the “value add” algorithms that are supposed to figure out how much of a student’s performance is attributable to a teacher are basically random number generators.
The result was a hugely stressful (and sometimes career-destroying) exercise in which teachers and students were human guinea pigs in an experiment that could have been evaluated more cheaply and quickly in smaller laboratory tests before it was unleashed on whole populations.
Considering the program’s failures — and all the time and money wasted, and the suffering visited upon hard-working educators — the report’s recommendations are surprisingly weak. It even allows for the possibility that trying again or for longer might produce a better result, as if there were no cost to subjecting real, live people to years of experimentation with potentially adverse consequences. So I’ll compensate for the omission by offering some recommendations of my own.
1. Value-added models (and the related “student growth percentile” models) are statistically weak and should not be used for high-stakes decisions such the promotion or firing of teachers.
2. Keeping assessment formulas secret is an awful idea, because it prevents experts from seeing their flaws before they do damage.
3. Parent surveys are biased and should not be used for high-stakes decisions.
4. Principal observations can help teachers get better, but can’t identify bad ones. They shouldn’t be used for high-stakes decisions.
5. Big data simply isn’t capable yet of providing a “scientific audit” of the teaching profession. It might never be.