The Statistical Illiteracy of Washington Post Wonk Blogger Dylan Matthews
The recently concluded Chicago teachers strike widened cleavages that Democrats wish would go away: between organized labor and the charter school movement, between public sector workers and Democratic mayors like Rahm Emanuel. President Barack Obama refused to pick sides, but everyone else did. Mainstream media were no exception. Over at the Washington Post, few were less abashed in cheering the defeat of the Chicago Teachers’ Union and providing intellectual cover to its detractors than Dylan Matthews.
Matthews, a fresh-out-of-Harvard writer for Ezra Klein’s Wonkblog at the Post, is one of a new breed of journalists: the young, bean-counting, charts-and-graphs-obsessed, policy geek. Mostly found on the left side of the political spectrum, their ranks include Slate’s Matt Yglesias, the New York Times’ Nate Silver, and, of course, Klein himself. They don’t usually crunch the numbers themselves (the poll-model-building Silver's an exception). Instead they report on those that do, from a perilous perch somewhere between academic objectivity and issue advocacy.
Here’s where they run into trouble. They’re not academics—nor do they claim to be—but their job is to distill economic and poli-sci jargon into understandable, policy-relevant bite-sized chunks. And problems arise when they get it wrong—whether due to a lack of understanding of statistics, misrepresentation of the studies they cite, or, in Matthews’ case, both.
In a Sept. 14 post, Matthews argued that union seniority rules for teachers (the "last in, first out" rule for layoffs) hurt student achievement. This is a mantra of school-reform proponents, who argue seniority protects bad teachers. Teachers unions see the push to end seniority as a pretext for budget-slashing school boards to get rid of the most experienced teachers, good or bad, since senior teachers earn higher salaries and cost more.
Matthews cited three studies, none of which shows the relationship he alleges, or purports to. The first comes to the not very earth-shattering conclusion that existing teacher-layoff rules in Washington State are primarily determined by seniority. Crucially, the dependent variable (or outcome) it measures is a teacher’s probability of receiving layoff notices, not student achievement. Even then, the authors themselves caution the relationship they find is correlative and not necessarily causal. “Correlation is not causation” is the first rule of statistics, lest you believe umbrellas cause rain.
The second study is a computer simulation, not an observational study (one that would measure the impact of seniority vs. value-added layoffs in the real world). Matthews doesn't mention this and in fact suggests the opposite—that it was based on observed outcomes: "That paper estimated that the gains due to using effectiveness-based, rather than seniority-based, layoffs improved teachers’ performance by the same amount as is gained when one replaces a teacher with one year of experience with a teacher with five years." In reality, no teachers’ performance actually went up, and there were no actual gains, because it was a theoretical model. The third isn’t really a study at all: it involves no empirical test and no measure of student achievement to speak of.
Matthews supplements this with his own simple bivariate regression, which estimates the relationship between a dependent variable and a single independent (causal) variable and does not control for any other factors. Matthews admits that the result he gets is statistically insignificant. But he concludes anyway, based on no real evidence, that "as a general policy, 'last in, first out' (or LIFO) has generally negative effects on student achievement."
I raised these issues with Matthews in an email. He responded, “My piece on layoff policies cited three studies on 'last in first out' that all found negative effects on student achievement. You can disagree with their methodology all you want but I did not misrepresent their findings.” In fact, none of them measured student achievement as an outcome. Rather, they measured the impact of seniority on layoffs and teacher performance, though the latter did so without measuring any actual teachers.
This was not Matthews’ first foray into bad stats. In a Sept. 10 post he argued that teacher strikes hurt student achievement, citing studies of strikes in Ontario, Belgium, and elsewhere. Doug Henwood at the Left Business Observer noted that Matthews left out the part in the Belgium study where the authors cautioned that their findings were “somewhat imprecise,” and that “there are no studies evaluating the long-term effect of teacher strikes on educational achievement of students,” citing endogeneity issues. This is a problem that arises when you believe x causes y, but there is another, unmeasured variable z that is affecting both x and y. For example, you observe teacher strikes correlate with bad educational outcomes, but don’t consider that poor school funding may be the real cause of bad education, and that underfunding also motivates teachers to strike more. The authors of the paper Matthews cited were aware of that, and thus much more circumspect in their conclusions than Matthews reported.
Henwood didn't mention that in the original version of the post, Matthews had written:
Baker found that if the strike happened when a student was in grade 2 or 3, their scores rose by slightly less. But if the strike happened when the student was in grade 5 or 6, their scores rose by a whole lot less. Scores for strike-affected fifth-graders were a full 3.8 percent lower than those for fifth-graders in schools and grades not affected. If that doesn’t seem like much, it’s 29 percent of the standard deviation (or the typical amount by which students differ from their class average). Strikes, in other words, accounted for one third of why some students did better than others."
Emphasis mine. After reading that, I, and I imagine other people, wrote in privately to point out that standard deviation is a measure of dispersion that says nothing about what might have caused the scores to vary.
The last sentence from that paragraph was later redacted. "This post has been updated to clarify some of the statistical findings summarized," reads a note at the top of the article page.
This obfuscates the problem. This was not a confusing definition that could have been written more clearly. It was completely wrong: saying a statistical measure tells us something it does not, and that a study made a finding that it did not.
All of this would be moot if Matthews were simply stating his opinion that he doesn’t like teachers’ unions. He’s certainly not alone, even among liberals; teachers’ unions are broadly unpopular, though for reasons I believe are more a matter of political expediency than facts.
However, Matthews and his ilk are not that kind of old school op-ed crank. They’re wonks and proud of it. They don’t use florid metaphors or tug at the heartstrings. They give you spreadsheets, pie charts, and regression tables. The numbers speak for themselves. Except sometimes when bloggers speak for them.
Teacher recall requirements don't hurt student learning. But seniority-based layoffs do: washingtonpost.com/blogs/ezra-kle…
— Dylan Matthews (@dylanmatt) September 14, 2012
@dylanmatt you can't do a bivariate regression, admit it's stat. insignificant, & conclude "LIFO has neg. effect on student achievement"
— Michael Paarlberg (@MPaarlberg) September 15, 2012
@mpaarlberg did you read the part in the other studies on LIFO?
— Dylan Matthews (@dylanmatt) September 16, 2012
@dylanmatt Yes. None show the link you allege.Study 1 was based on a simple correlation. Authors admit no evidence of a causal relationship.
— Michael Paarlberg (@MPaarlberg) September 16, 2012
@dylanmatt Study 2 was a simulation based on no observational test. Study 3 had no empirical test to speak of.
— Michael Paarlberg (@MPaarlberg) September 16, 2012
Photo via Flickr user Zol87, creative commons attribution license