Does Good Teaching Equal Good Test Scores?

by Robert Pondiscio
October 24th, 2009

In his New York  Times column praising the Obama administration’s “quiet revolution” on education, David Brooks writes ”there is clear evidence that good teachers produce consistently better student test scores.”   I ask this question not rhetorically, but in earnest: what is the “clear evidence” to which Brooks refers?   Is there a study that defines good teaching, identifies good teachers and THEN looks at the impact of those teachers on test scores?

If we define good teaching as the ability to raise tests scores, Brooks’ assertion is merely a tautology.


  1. I think what people who push that line are really saying is that some teachers produce good test scores much more consistently than others, so that it’s possible to define good teaching that way.

    The goal of value-added models is then to identify and reward those teachers.

    Perhaps paradoxically, its a tautology that doesn’t have to be true — if particular teachers don’t consistent produce better test scores, the whole model falls apart.

    Comment by Rachel — October 24, 2009 @ 4:30 pm

  2. Rachel, isn’t it true that even value-added models use tests as a barometer? So we’re in the same place, no? Good teaching means good test scores. Bad test scores means bad teaching, QED. Here’s my concern: if the buildup of background knowledge year over year leads eventually to broad linguistic competence, there’s no guarantee that you will see year over year growth–especially if reading tests are not corruriculum-specific. If the tests are essentially random (and they are), how do link the teacheer to the test. Say, for example, there’s a reading passage on a 5th grade reading test about Greek mythology, which my students studied in seconf grade. Say there’s another passage about photosynthesis, which my students had in 4th grade science class. Lucky me! I get credit for the background knowledge I had nothing to do with creating.

    This is why I asked my question about Brooks’s broad assertion of “clear evidence” that good teachers produce good test scores. I’d like to know what that evidence is, or if its evidence of a teachers ability to prepare students for tests, for example.

    Comment by Robert Pondiscio — October 24, 2009 @ 5:01 pm

  3. Robert,

    There is clear international evidence of the effect that teachers have on student test scores; and the results are not as strong as many would hope. (Fuchs and Woessmann 2004)

    But a few elements do stand out. Teachers with pedogoical training do slightly better than those with none. But the largest effects come from teachers with Masters degrees in content but not in pedagogy. Even still, the effect of teachers on student learning is a very small portion of the student testing variablity. Larger effects come from curricula, school organization, SES status etc…

    So is it really appropriate to judge teachers based upon standardized tests when there are so many other uncontrolled variables that contribute to scores (including all that you mentioned as well).

    Even more troublesome is the effect the *type* of test has on student learning; with standardized tests (not tied to the curriculum) having a negative correlation with student learning.

    From Fuchs and Woessmann, 2004…”If there are no external exit exams, standardized testing is statistically significantly negatively related to student achievement in all three subjects (reading, math, science). That is, if the educational goals and standards of the school system are not clearly specified, standardized testing can backfire and lead to weaker student performance. But the relationship between standardized testing and student achievement in all three subjects turns around to be statistically significantly positive in systems where external exit exams are in place.”

    One might be tempted to speculate that in their attempt to tailor learning to the (essentially random) tests, teachers inadvertantly remove all the content that actually is needed for students to be successful. Tests seem to be only positively related to student learning when the tests are directly related to defined learning goals.

    Teaching matters. But given the confounding variables and the lack of connection between standardized tests and the classroom environment, how will these value added measures based upon (essentially random) tests ever really pick up quality teaching?

    Comment by Erin Johnson — October 24, 2009 @ 6:33 pm

  4. So…how do you measure whether or not a teacher is effective? There must be a way…

    Comment by tim-10-ber — October 25, 2009 @ 7:35 am

  5. “The goal of value-added models is then to identify and reward those teachers.”

    It appears that the “reward” will soon be reassignment to the high poverty schools whose students are deemed to be most in need. Under the terms of the stimulus education grants to states under ARRA and as Rhode Island acknowledged this week when it abolished seniority and teacher preference in teacher assignments, “equitable distribution of effective teachers” is the new goal in federal policy.

    If effective teachers are finite, who prevails – the students who failed to learn arithmetic or the students who aspire to learn calculus?

    Comment by Student of History — October 25, 2009 @ 11:21 am

  6. Of course, there are dozens of obvious factors leading to test scores. Teaching is one of them; but cannot be isolated from the others in any study that comes close to a scientific effort.

    Good teaching has more to do with raising the scores of individuals but is never measured. Even then, it is much easier to raise a low score than to raise a high mark. Can we evaluate that?

    How do you compare teacher “A” bringing scores of 37% up to 52% with teacher “B” who raises scores of 89% to 93%?

    Comment by ewaldoh — October 25, 2009 @ 1:18 pm

  7. @Tim-10-ter: Sure there are ways, but first you have to get people to agree on the end goal of education. I am pretty sure P21 and Core Knowledge would have different definitions of effective teaching.

    Comment by Matt — October 25, 2009 @ 4:59 pm

  8. Robert, your question is excellent and one that can be asked only because school districts now have the ability to link teachers with the individual learning progress of each student. AFT really need to be concerned, and the New York study as well as the New Haven contract will only slow the pace of their member’s retreat. Our teachers and their students would be much better served if the AFT took stock of the fundamental system opportunities now available to actually level the accoutability playing field. They would do their profession and our public education system a great service by standing up for a few reachable and tranformational goals. The goals are simple. Until our state and districts coherently and sequentially define the “what to teach,” provide curriculum templates that are precisely aligned with the “what to teach” definition, give classroom teachers an accurate picture of the specific knowledge and skills already learned by each student entering their class, provide teachers the readily available process tools needed to pinpoint the learning progress of each child in each class in real time, evaluate the instructional curriulum for effectiveness with individual students and student groups and support teachers with ongoing guidance, resources, leadership and remediation assistance for each student that falls behind the clearly defined pace for academic achievement, any evaluation of ‘good teaching’ will be subjective and unbalanced. States and districts can access all of these system process tools and capabilities at a far lower cost than their current expenditures of teaching and learning. Core Knowledge has demonstrated the viability of providing teacher clear objectives aligned with coherent, sequential curriculum. Unfortunately, the education policy community would be out of business and our political leaders would no longer be able to champion the great issue of saving our failing schools. Bill Gates and the Congress should be investing in real innovation rather than makeshift studies that encourage mediocrity, failure and further retreat from effective teaching and learning.

    Comment by Steve Kussmann — October 25, 2009 @ 5:08 pm

  9. Matt, Tim-10-ber,
    If schools were interested in a true measure of teacher effectiveness (if we could ever define this), they would have to control for so many variables, to an extent that is practically impossible.

    Even if schools could control for every variable (curricula, SES, school structure, etc…), given the inherent variablity between students, statistically class sizes at 30 students would at best detect Hazard Ratios of 2 or more. What this means is that Teacher A would have to be twice as good as Teacher B for there to be at least an 80% confidence of determining that Teacher A was better than Teacher B. Anything less than twice as good will not be detectable and with an 80% confidence 1 in 5 teacher comparisons would still be wrong.

    This is a bit technical, but considering that teacher evaluation would have to rely upon statistics, it would be nice if the math indicated that it might be feasible. So far, the stats really say no; it is not possible to use tests to differentiate between teachers in a reliable manner.

    Comment by Erin Johnson — October 25, 2009 @ 9:28 pm

  10. The $64,000 question then (those of you under fifty years old probably don’t get the relevance of this reference) appears to be a two part question. How do we define good teaching and can teacher effectiveness, in fact, be reliably measured?

    Matt raises an interesting polemic regarding the former while Erin seems to have a scholarly handle on the latter.

    Our ability to measure teacher effectiveness must be resolved because the system used in US schools today to evaluate teachers is in drastic need of revision. It’s too subjective and often nothing more than a foregone formality in a teacher’s permanent record file. Pardon me while I vomit on some of what I’ve witnessed.

    William Sanders value-added measure appears to be the most promising quantitative measure on the horizon. However, there are numerous variables that would have to be teased out (Erin/above) to seriously consider Sander’s work as the answer at this time. Authentic random student placement could be a step in the right direction but may not be enough to legitimize the process completely. One can only hope Sanders is close to improving on the procedure he’s already developed.

    Comment by Paul Hoss — October 26, 2009 @ 5:41 pm

  11. Paul,
    Authentic random student placement won’t help with the stats. To statistically detect a 30% difference in teacher quality, the class sizes would have to be over 100 students. Not really practical nor in anyone’s best interest.

    Comment by Erin Johnson — October 26, 2009 @ 6:58 pm

  12. My take is that Brook’s assertion indeed is a mere tautology. And I wonder if he might not agree if you could pin him down on it.

    I think the operational definition, or the cultural default definition, of a good teacher changes over time. I do remember as a young teacher concluding that at that time the operational definition of a good teacher was simply a teacher who can control the class. I didn’t care much for that definition. I wanted to do more, but was struggling hard simply not to do less.

    I also remember many years ago there would be occasional attempts to define good teachers by adjectives. A good teacher is knowledgeable, likes kids, has good communication skills, is active in the community, utilizes democratic methods, etc. I realized that approach was very unsatisfactory to me. It always seemed to boil down to cataloging the obvious and nothing more. A list of adjectives is not an analysis.

    Now, apparently, the operational definition of a good teacher is one who raises test scores. Whether you call it a tautology or a definition, that seems to be what we mean. But that is also unsatisfactory to me. Teaching on the college level I am not involved with molding character, instilling values, developing appreciation of our cultural values, or building good citizenship. I don’t have any parental role. I just try to teach algebra as best I can. But when I think of what I want from elementary school teachers, it’s quite different. A teacher in elementary school, in my humble opinion, has an important parental role. That’s not discussed much now days. Maybe it should be. Test scores are important, but it’s not all that’s important.

    But even if we accept the current definition of a good teacher as one who raises test scores, must we look at teaching as a black box? A “black box” in this sense means we only look at inputs and outputs. We don’t understand how the black box works and don’t expect to. Maybe we don’t even want to. If a teacher gets good results, give him a raise. End of story.

    Don’t we want to investigate how teachers get those good results? Well, we do, to some extent. We talk about “best practices”, and “lesson study” and so on. But I don’t think we go very deep. It seems we’re in an era in which teachers are supposed to do what they are told, and face consequences if they don’t, but are judged poor teachers if by doing what they are told the results are not good. Of course most of what I know about this is from what I read in the ed blogs. I read a lot of frustration in those blogs.

    I don’t know much about Brooks. I remember often agreeing with him in the past, and disagreeing also at times, but I don’t remember details. But there is a real mystery in his present column. He seems to accept as a basic premise that reform has to be good. We want reform. We want change. I am not convinced that’s wise. It has always been a mystery to me why anyone would advocate “change” without specifying what change they want. I realize it done a lot in politics, but it never got a vote from me. “Work for change” is a magic phrase in some of my friends, but an irritating phrase to me. It’s totally empty. I have some ideas about what change I would advocate in American education, but I don’t expect what I advocate would be exactly what others advocate.

    Brooks thinks it’s good that Duncan has a few billion dollars to provide incentives for change. I’m cynical enough to think it’s not a good thing. It’ll be wasted. I’m think the results, good or bad, will be so diluted by endless other factors in endless other ways that it will always be a matter opinion whether anything good was accomplished or not. We will have test scores over the years, of course. But again the test scores, good or bad, will be affected by many other factors other than whether or not we carry out the plans of Obama and Duncan.

    Comment by Brian Rude — October 27, 2009 @ 12:13 am

  13. Great post, Brian. I particularly like your “black box” section. I continue to agree with those who say accountability is important, but I do worry that the formulation good teaching equals higher test scores and higher test scores equals good teaching” leaves out for more than it includes, even implicitly–especially for early childhood and elementary teachers. The ed reform express has gathered a considerable head of steam. So much so that this may be the definition we’re stuck with for now. But it’s a stunningly blunt intstrument by which to gauge good teaching.

    Comment by Robert Pondiscio — October 27, 2009 @ 9:48 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

While the Core Knowledge Foundation wants to hear from readers of this blog, it reserves the right to not post comments online and to edit them for content and appropriateness.