Diane Ravitch on Teacher Evaluation and Value-Added

by Guest Blogger
November 18th, 2008

by Diane Ravitch

In his post, “Getting Value-Added Right,” Robert raises excellent questions, and his restaurant metaphor is apt. The value-added growth model, as Dan Willingham notes in the comments section and his post on the Britannica Blog, is not ready for prime time. There are too many intervening variables to hold teachers solely accountable for the test-score growth of every student. Given high rates of mobility, there is a large fluctuation in the student population in schools. As Thomas J. Kane and Douglas O. Staiger point out in one of their papers, their inherent volatility make test scores a poor basis for an accountability system.

The imprecision of test score measures arises from two sources. The first is sampling variation, which is a particularly striking problem in elementary schools. With the average elementary school containing only sixty-eight students per grade level, the amount of variation stemming from the idiosyncrasies of the particular sample of students being tested is often large relative to the total amount of variation observed between schools. The second arises from one-time factors that are not sensitive to the size of the sample; for example, a dog barking in the playground on the day of the test, a severe flu season, a disruptive student in a class, or favorable chemistry between a group of students and their teacher. Both small samples and other one-time factors can add considerable volatility to test score measures.

There are many, many reasons why one-year changes in scores are not reliable. There are many reasons why it is hard to give credit or blame for students’ test score gains and losses from year to year. Until we have better tests and have ironed out many of the confounding variables, it is unfair to make credible inferences about teacher performance from test scores, let alone use such data to dispense rewards and punishments.

There is another reason to worry about value-added growth models that determine a teacher’s fate and compensation. If we turn teaching into an activity whose sole purpose is to produce gains on tests that we know are mainly low-level and dumbed-down, we will not make education better. We may succeed in destroying it altogether. We better find ways to emphasize the quality of curriculum (think Core Knowledge) and to de-emphasize the number of times that kids are asked to check off a box on standardized tests in the course of a month. Or our education system will be far worse than ever.

Diane blogs on education at Bridging Differences — ed.


  1. Hi there,

    You cite a 2002 paper by Kane et al questioning value-added.

    But in 2006 they authored a paper specifically advocating for value-added.

    “We propose federal support to help states measure the effectiveness of individual teachers—based on their impact on student achievement, subjective evaluations by principals and peers, and parental evaluations. States
    would be given considerable discretion to develop their own measures, as long as student achievement impacts (using so-called “value-added” measures) are a key component.”

    Seems like the scholars you are citing disagree with you on whether value-added is ready for prime-time. In fact, I believe they’re involved in a number of such projects.

    Comment by Mike G — November 19, 2008 @ 10:18 am

  2. I notice that the quotation you cite is a multi-faceted evaluation, not based on test scores alone. It includes “subjective evaluations by principals and peers, and parental evaluations” in addition to value-added scores. I suspect that Kane and Staiger would say that this approach is not necessarily inconsistent with their earlier work on random variation. Random variation is still a problem, and it is accentuated if one looks at only one-year test score changes. Most scholars who advocate value-added models suggest that it requires three years of testing, and testing done in September and June. Even then, there will be confounding factors, having to do with families, students, and the environment.
    Testing is not an exact science. In fact, the closer you get to the construction of tests, the more arbitrary they seem.
    I suggest that you read Richard Rothstein’s recent work “Holding Accountability to Account,” which you can google.

    Diane Ravitch

    Diane Ravitch

    Comment by Diane Ravitch — November 19, 2008 @ 10:22 pm

  3. [...] have both written about the unreliability of test data as a metric of teacher performance and the inherent damage of educating students to take tests, would have been wonderful [...]

    Pingback by CBS Flunks Coverage of Teacher Tenure | Deja Queue — February 28, 2011 @ 1:31 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

While the Core Knowledge Foundation wants to hear from readers of this blog, it reserves the right to not post comments online and to edit them for content and appropriateness.