Insisting that states allow the use of standardized test data to evaluate teachers, the signature feature of the Race to the Top fund initiative, was personally approved by President Obama, according to EdWeek’s Michele McNeil.
When Education Department staff members finally settled on the data firewall rule, which would effectively knock out two states with giant student populations and powerful Congressional delegations, I’m told that education staffers took it up to those above their pay grades. To Obama Chief of Staff Rahm Emanuel, and eventually, to the president himself. And Obama, apparently, didn’t need much convincing.
Meanwhile, in a detailed analysis of the guidelines, Stephen Sawchuck says the data firewall issue is not just about performance pay.
States receiving Race to the Top funds must commit to using their teacher-effectiveness data for everything from evaluating teachers to determining the type of professional development they get to making decisions about granting tenure and pursuing dismissals. And, they will also be expected to track graduates of their education schools into classrooms to help institutions figure out which pathways and courses produce the best teachers.
The issue is not the use of the data, but the value of the data. Is it possible to make good decisions with bad data? Perhaps it doesn’t matter. One possibility raised by Fordham’s Mike Petrilli is that states will “superficially swear allegiance to these reform ideas but implement them half-heartedly down the road.”


Hi Robert,
In the link where Dan W rejects value-add, he doesn’t say “let’s not evaluate.” Instead, he makes the claim: “Some districts have succeeded with peer review.”
And his evidence for success is…what?
He links to an AFT paper that literally makes no evidentiary claims that the good teachers and bad teachers are correctly sorted.
It just says 11 of 88 new teachers were not renewed. We have no idea whether the peer review sorted correctly, do we?
You pose the question as “Is it possible to make good decisions with bad data?”
I’d suggest the question is “Is it better to make decisions with a combination of imperfect data and imperfect observations, or imperfect observations alone (Dan’s preference), or simply to avoid ever making decisions (status quo in most places).
There need to be protections against misuse of test score data, but the state-legislature-level firewall seems to me to be a rather blunt and cumbersome one.
On the other hand, I’d be appalled at any state- or federal-level attempts to mandate how data is used.
A good teacher evaluation system is one that both principals and teachers believe in and I don’t think it can be designed by bureaucrats.
I might suggest that making decisions with a combination of imperfect data and imperfect observations would produce results not distinguishable from the status quo.
I have been a Core Knowledge fan thru the years. I have a number of your books and have often suggested them to other homeschoolers. I have read your blog intermittently.
I am dismayed by your recent series.
You use to be somewhat balanced in your view of things. You seem to have swung into a far right position in which all the initiatives by the Obama administration are attacked. You misrepresent his position by selective quoting. It’s in contrast to your treatment of the Bush Administration’s education policy which shifted the role of the Federal Government role in K12 education dramatically. The NCLB initiative redefined the public education system from being available to help anyone get ahead to being responsible for everyone to get ahead. These were radical moves which represent typically a Democratic position. Yet you were subdued in your comments about them but the Obama initiatives which are refinements in the same general direction, you attack and attack and attack.
In short, the Core Knowledge blog should have a point of view on education. That would be acceptable and respectable. Instead, you have a political axe to grind which makes the whole Core Knowledge initiative look and feel very different to me and to many others.
I’m very disappointed in your political approach to this important question of education.
I like the Obama/Duncan Race To the Top funding initiative for several reasons: (1) The existing system for evaluating teachers in our public schools is nothing short of an embarrassment. It’s far too subjective and it has made zero progress toward identify the quality of teachers; (2) The proposal will be a formal sanctioned attempt at quantifying the performance of teachers for the first time in US public school history; (3) Dismissal cases that go to court will have objective data to substantiate their cause. This will deter unions from blindly defending all (incompetent) teachers should they consider legal recourse; (4) Superlative teachers will finally be rewarded financially while those less competent will be encouraged to amend their practice or seek alternative employment. The clear winners in this situation – students; (5) Teacher performance linked to student test results will eventuate into the long awaited meritocracy for our public schools we experience in most aspects of our unique market economy. FINALLY!!!
Miss Eyre,
Is any profession judged with perfect data and perfect observations?
The status quo is – teachers are maybe observed once a year, and 99% are pronounced satisfactory. There are almost no false positives, b/c everyone is allowed to teach, irrespective of the cost they may impose on a kid.
Dan’s preferred alternative is that a fair number of new teachers get fired. But the decision is entirely based on observation. Look I’d take Dan’s plan any day over status quo.
The Obama plan is that evaluators, in addition to observation, will also get to see whether similar kids with similar starting points made far larger or far smaller gains that year with a particular teacher. I can’t imagine an evaluator that wouldn’t want to combine the imperfect data with the inherently imperfect observation.
Sandra,
First and foremost, you will notice a disclaimer to your right. It reads “the views, conclusions and opinions of authors, contributors and commenters on the Core Knowledge Blog are those of the authors and do not necessarily reflect the views of the Core Knowledge Foundation.” Here’s what that means: exactly what it says. So this blog is not an institutional voice for Core Knowledge. It is intended to be a place for teachers, administrators, parents and others and anyone else interested in education to engage in a free, civil and hopefully thoughtful discussion about issues affecting the field.
If there is an institutional point of view to this blog, it is the firm belief that every child in this country would benefit from a full, robust and content-rich education. This particular reform seems almost wholly absent from the education policy dialogue and this blog, perhaps alone among edublogs, seeks to speak to that (to my mind) deficit. This also informs my personal bias as a writer. When I evaluate education policy initiatives, there is one screen: does this further the cause of giving every child that kind of education? Or does it damage it.
I think this is in full public view should you care to look. Longtime readers of this blog know I have argued for national standards and assessments many times here. I still favor both, in principal. But when I received a draft of the proposed voluntary national standards and felt them to be virtually content-free, privileging skills almost exclusively at the expense of content, I said so. Similarly, I have long supported school and teacher accountability—with teeth. But when a specific accountability proposal, in my view, works against the ideal of giving a child a well-rounded education, I said so and explained why I found the idea wanting. If it’s pointing out President Obama’s role in this that you object to, well, that’s a news judgment. Personally I would find any President’s individual attention to a granular piece of education policy noteworthy, and worth passing along.
More biases: I am a registered independent. Moreover, I’m a thoroughgoing independent. Thus I find the charge of political bias hard to accept, and again, my defense is in full public view. I have described being an enthusiastic supporter of NCLB, the policy prescription of a Republican administration. When I began to perceive its narrowing effects on curriculum having a deleterious impact on the school where I taught and others like it, I reevaluated my position. When I saw the current Democratic administration headed down a path that threatened to make accountability-driven narrowing even worse, I raised my voice again, and explained my objection. I do not evaluate policies based on which party proposes or champions it. I evaluate it based on whether, in my judgment, it will further the goal of improving education by ensuring a rich core curriculum for all students, and especially the kinds of students I used to teach in my South Bronx elementary school.
I have maintained a policy of allowing an open discussion on this blog, because just like the cause of getting curriculum on the policy agenda, I feel that educators’ voices are seldom heard. Thus I post unedited comments I disagree with, including yours. But here’s the beauty of this forum. I learn a lot from it. So if I’m wrong about curriculum narrowing, evaluating teachers on test scores, national standards or any of the dozens of subjects we tackle here, then by all means, set me straight. State your case. That’s what this forum is all about. But simply leveling charges of political bias seems to me to add merely heat, not light.
Best,
Robert
Paul,
What is it about test scores that gives you such confidence that they are a sufficient basis on which to make the decisions you cite? I can agree with virtually all of your points: teacher evaluation IS a joke. We DO need to have a better handle on what makes one teacher better than the other for any number of reasons. But either you believe that test data alone gives enough information to do all you suggest, or you’re comfortable reaching vast conclusions with half-vast information. If it’s the former, tell me why. Actually, if it’s the latter, I’d also like to know why.
Yes, the current evaluation system is a joke. The current state of data based on NCLB test scores is a joke. The idea that Value Added Model could be used for evaluations system-wide or state-wide is a joke.
What would happen if you combined them all? Sometimes the result would look like the “status quo.” Other times, high-quality systems would emerge. But some districts would devolve into data-driven evaluations, and those systems would end up in court.
In many places as in some parts of Oklahoma, corporations would pass the hat to fund test cases that would have absolutely nothing to do with schools. They would simply be trying to destroy a union, any union, in a high-profile case in any high profile case. The determining factor would be the trial judges and appeals judges in the area. Fighting would be the purpose of the fight and students would just be pawns in the game.
And Paul, in some cases unions would have to defend the incompetent along with the effective teachers as they battled the system. The law has to apply equally for all. If we want to remove bad teachers more efficiently, and I do, we must negotiate sustainable and legal procedures.
But that would be forgotten as the legal battles raged and millions were wasted on lawyers fees.
GGW, you teach in a charter school and because it is by definition a small school system, the normal diversity doesn’t apply to you. But once a system reaches a certain size, the variety of the quality of principals and of policies, along with choice, means that growth models can’t be used competently and fairly. But what who happen if a district or a state took the Florida model for merit pay and applied it to evaluations? Teachers who caught a tough class or a bad administrator would have a 60% presumption of ineffectiveness.
Here’s how I would distinguish between data-driven evaluations which unions must always fight from data-informed evaluations that we should embrace. If student growth data is the “first cut,” identifying teachers who are presumed to be ineffective, teachers must oppose that type of data-mining. If data is used systematically to complement, supplement, verify, and/or disprove human judgments then that’s a win win. It would better still if data encouraged collaborative discussions. And the infrastructure could be used for real time interventions, like early identification of students with attendance problems. Then the human and digital infrastructure could be used for data-informed instruction.
Not everyone would suffer if the Obama proposal was adopted. But we can’t afford to sacrifice the teachers who would be caught in impossible situations because of the luck of their draws with administrators or policies or classes. They could cower in fear as their fellow teachers were threatened with plans of improvement for reasons completely out of their control – until the system came for them. Or they could fight. Or most likely, they would leave the profession in droves.
It would be the classic situation where powerless people adopt CYA approaches like excessive test prep.
The devil would always be in the details which is why the union can’t unilaterally disarm. Take my class last year. I usually had 140 students but my final inactive roll was up to 107 students, meaning that I had almost as many students transfer in as transfer out. If a growth model assess growth on my 80 or so students who attended class enough to count under NCLB rules, I’d have great results. If I was accountable for raising scores for the dozens of students who transferred in and out from homeless shelters, jails, alternative schools, foster homes etc., I be judged an incompetent. If I was judged by tests cores from any given day, who knows; results would be completely determined by chance.
And that’s a reminder. We haven’t even tried to use growth models in high schools. Ed Week discussed the latest research on why Value Added Models aren’t ready, including a study of elementary schools where principals agreed to assign teachers randomly. One third of the principals agreed. This would never be possible in high schools. The only way you could compare results in a inner city freshman Algebra I class with the junior Algebra II class across the hall (or in the same room taught by the same teacher) would be to round up the ½ of the freshmen class who have since dropped out or been jailed or are hanging out on the street corner and forcibly reunite them with their Algebra I classmates. I don’t know how you’d factor out the differences in maturity levels.
I have no doubt that President Obama immediately saw the benefits of robust data systems. I doubt his advisors briefed him on the other side. I doubt his advisors have enough real-world experience to understand the other side. But we can still persuade them.
John: Why compare a freshman Algebra I class with the junior Algebra II class across the hall?
Why not compare Alg 1 class to exact same kids scores from previous year in Grade 8 pre-alg?
GGW,
I used the example of what I know. I can’t know everything. You can’t know everything. And designers of Value Added Models can’t know everything.
Before I answer I have to remind you that your side has the burden of proof. It is advocates of systemic uses of growth models that have to prove that they are valid. You can’t just take one example and say that the VAM might work there. It has to be reliable across the entire system that is using it. And neither can Obama anticipate where and how those models could be used or misused.
To answer your question, last year our middle school had a higher dropout rate than the high school, which is something I never thought was possible. (In NYC, none of those middle schoolers would count as dropouts, by the way) Computer programmers aren’t going to account for something that was previously seen as impossible. So, in that case the 8th grade teachers would get screwed. But that’s OK, we’ve got plenty of math teachers who want to take on the toughest challenge in education, so middle school pre-algebra teachers are expendable.
You reminded me of our 8th grade math teacher who stunned everyone with her pass rate. I asked her if those students could bring our 9th grade pass rate out of the single digits. She could have posed as the muse, but she replied, “Of course not!” Then using terminology I couldn’t understand she explained how her increased scores were the result of a,b, and c, but that the test didn’t reveal their weaknesses in x,y, and z which they would need for Algebra. Soon afterwards, the State agreed and changed cut scores. We’re being told to expect a major drop. So if that’s true, should they fire her rather than give her the bonus that she would have earned if the preliminary pass rate of last spring had held?
You also remind me of a superstar principal whose test scores got him a promotion to a middle school. He explained that his 5th graders had produced great scores in June, but when they were promoted to 6th grade their September scores had collapsed. “I’m pretty good!” he exclaimed, “I can ruin a school in six weeks.!” Of course we both knew the reality. Any of his 5th graders who had a choice were able to exercise that choice and find another middle school. The only kids who go to the toughest neighborhood middle schools are the ones who have no choices.
In my city, at least, the market for charters, magnets, enterprise schools, etc. is completely over-saturated. The only option now is to improve the neighborhood schools. But data-driven evaluations would be suicidal. Then the last of the best neighborhood school teachers would have to seriously consider their choice options. But that gets me back to the arguments I just made.
I think you pose a very fair question. Where is the burden of proof?
I believe it’s a very reasonable argument to say the burden of proof is on the “changers.” Right now we have a system where basically nobody is evaluated. So anyone who wants to change it, one might argue, must show that their system is better.
I would agree with that.
Another argument – the most common – is that anyone who wants to change the system has a higher bar to clear. It’s that any change must not just be better, but perfect.
If a single teacher would be wrongly judged, better to protect that one, even if it ensures thousands of harmful teachers remain under the current system of no evaluation.
I don’t agree with that rationale. But it’s a fair question.
Others, like Dan W upthread, seem to be uncomfortable with the current “nobody gets evaluated” system. So he argues for rejecting data on outputs but allowing human observation of inputs exclusively, instead of a combination of both.
His method is equally subject to the issue of “a teacher could be wrongly judged.” He and others believe that teachers should be judged, but exclusively by observations, particularly if the observations are done by peers (teachers) rather than principals (former teachers).
Politically, of course, that is more popular. Empirically, hard to see why.
Here’s a question that I never hear asked: Why are we so convinced that when there is a breakdown it’s by definition a failure of teaching? Don’t mistake me, I certainly agree with much of what has passed above–teacher evaluation is sorely lacking, and there are certainly an unacceptable number of poor teachers in classrooms (one of my issues with the proposal to tie teachers to individual test scores is that it seems more likely to hound good teachers out of the profession through boredom and bureaucracy as it is to “catch” bad teachers). But if I had a magic wand to waive over American education, the first thing I’d aim it at would not be teachers.
Of all the ideas that have come to education from the business world, the one that seems not to have been considered is: hire good people and get out of their way. As a teacher, I do not have a curriculum, but pedagogy (how I teach) is tightly prescribed and monitored. I have limited authority to control student behavior and work habits. I have no control over family and peer factors that may actively undermine what I am trying to accomplish. I am, for all intents and purposes the only accountable person in the learning process and any student failure is my failure alone. This is, needless to say, breathtaking nonsense. As a business metaphor, we’re essentially saying you have no control over your process, your product, your working conditions, or your staff. Yet you are 100% responsible for their output and productivity.
GGw,
Why keep throwing up straw men? Why not address the real world statement just made by Robert? You don’t deny his point do you?
The point is not saving one teacher at a cost of thousands. What ratio would you approve? In any schools, for the reasons that Robert just explained, the cost to the innocent teachers would be much higher. Would you sacrifice one out of ten teachers? One out of twenty? One out of fifteen? Remember the Baby Boomers are going to be retiring and you’ll need to replace us, also.
Besides, you guys could get most of what you want, and what we want, if you’d see our points on our nonnegotiables and then compromise.
Frankly, I’m looking at this from the perspective of the inner city. Implement your systeem and our schools will empty of effective teachers. You may think that charters can take up the slack, but I wouldn’t gamble children’s futures on it. When you face ALL of our challenges, under your system charters would also empty.
Robert’s comments above are worth keeping in mind: “As a teacher, I do not have a curriculum, but pedagogy (how I teach) is tightly prescribed and monitored. I have limited authority to control student behavior and work habits. I have no control over family and peer factors that may actively undermine what I am trying to accomplish. I am, for all intents and purposes the only accountable person in the learning process and any student failure is my failure alone. This is, needless to say, breathtaking nonsense. As a business metaphor, we’re essentially saying you have no control over your process, your product, your working conditions, or your staff. Yet you are 100% responsible for their output and productivity.” (Sorry for the long quote, but I wanted to make sure I got it right.)
To respond to GGW and others here: The creation of an “imperfect” teacher evaluation system, assuming that “imperfect” is automatically better than what we have, is not the answer. I agree with Robert and others that the current system is, more or less, a joke. So if we’re going to create a new teacher evaluation system–and I’m not necessarily saying we shouldn’t–it should come with three big safeguards built in:
1.) Student test scores should be compared across subjects in the same school year. If a kid is doing poorly in ALL subject areas, it is likely that he or she does not have five or six terrible teachers, but rather that he or she is having a hard time for some personal reason that is not attributable to a teacher.
2.) Peer review should be randomized and come from outside a school building. By this I mean a “jury duty” system that I have proposed elsewhere. A “jury pool” of teachers with, say, a certain amount of experience, education, interest, or respect within their communities would be pooled for a day or two here or there to go to different schools and observe teachers they don’t know. Teachers would be selected from this pool randomly. This would prevent teachers in a building from ganging up on an unpopular teacher.
3.) Test scores should be ONE factor in a teacher’s evaluation, not the be-all and end-all.
And to respond to Robert particularly: You are absolutely right in the comment I quoted. “Firing” students seems to be exactly what certain charter schools are up to, incidentally.
Miss Eyre,
I really like and agree with your comments above. If my principal decided to institute some of those policies into his evaluation process at the beginning of this school year, I wouldn’t be very resistant. Many times, it is how I check on my own teaching. When I see a student not doing well in my class, I check his/her progress in his other classes and conference with his/her other teachers to see if I am or am not the problem.
I also agree with Robert as well. There is so much talk of adopting a model that truly treats teachers as other professionals are treated, with merit based pay scales and the like. However, we are told to control so many things that we cannot control. And our society/government doesn’t seem to be ready to address the larger issues that affect student progress, not just poor teaching, but living in poverty, issues of diversity and racism, and the many family and peer issues that affect a child. It would be nice if someone went to effective teachers, asked them what they needed to be even more effective, and then tried to figure out ways to implement those suggestions instead of immediately assuming teachers are asking for the impossible.
I definitely consider myself to be among the ranks of effective teachers, but I have to admit that I would be extremely tempted to leave the profession if my progress was to solely be judged on test scores. I have had many moments of success with students in which the student’s test score was not what defined our success…success was defined as being able to actually master a learned skill without the teacher’s guidance, or, even, simply creating a regular routine for sitting down and completing homework when the student was previously ignoring the existence of work in general. Test scores don’t measure those things, and yet, if you ask me and those students, we accomplished alot.
I teach in Chicago, where Duncan was CEO of Chicago Public Schools. I didn’t like what he was doing as CEO, and I don’t like what he’s doing as the Secretary of Education. He’s proven himself to be a numbers man, and someone has to come and balance out that perspective. Education has never been a completely exact science, and I’m no sure it can be. But I’m definitely not a veteran teacher, so I eagerly welcome other perspectives on my comments.
Robert,
Test scores, for me, quantify the process. However, they will be accompanied by a myriad of problems, initially.
Based on what I’ve read about William Sanders and value-added assessment I believe this is an avenue that needs to be explored, developed, and hopefully, refined to the point of being effective.
In the same breath, I’m cognizant of many of the issues John raises, especially in regard to educating urban youngsters. If John can remember, I’ve always maintained inner-city teachers are a special breed and should be compensated accordingly. They deal with problems those of us in suburbia could never imagine. And transient students are clearly at the head of the list.
So yes, I’d like to see a significant paradigm shift away from our existing model for evaluating teachers. It’s ineffective and needs to be dumped. The best prospect I’ve encountered is VAA and potential merit pay/performance pay as the primary alternative. If there are other models out there districts are employing (I’m not aware of any, at least on a wide scale).
So onward and upward with VA and refine it as we go. Sorry if that sounds a little cold and unfair to those who will be the guinea pigs (teachers and students) but that’s how bad the existing system has proven itself to be. BTW, it’s not necessarily the system, it’s administrators unwilling to do a thorough job of evaluating and take a stand on their assessments, and the lack of quantifiable data on a teacher available to them.
Districts cannot be overlooked either as the culprits. They’re all scared to death of the legal costs if a fight ensues (and it usually does) regarding the dismissal of a teacher. That perhaps is the primary reason I’m in favor of VAA, merit pay, etc. It provides the administrator and the district with a semblance of objective information on a teacher, data that could well, if gathered and presented appropriately, hold up in court.
HOWEVER, before we take any of these folks to court for dismissal I’d really like to see the information gathered from these tests used TO IMPROVE INSTRUCTION. Then, and only then, if they’re unwilling to amend their practice and improve on their performance, then you take them to court and run their sorry behinds out of the classroom.
Duncan and many of you are talking as if there’s a hoarde of brighter, better candidates clamoring to fill the slots opened by a more harrowing evaluation process. There aren’t. Look, there are states, like Texas, where tenure doesn’t even exist. It should be a utopia for education, since bad teachers can be fired at will. Is Texas educational utopia?
Why don’t we start talking about a more harrowing CURRICULUM evaluation process? Or a SUPERINTENDENT evaluation process? Or an ED SCHOOL evaluation process (shut down those whose grads don’t raise test scores, or whose ideologies don’t raise test scores)? Or a CHARLATAN,oops, I mean EDUCATION CONSULTANT evaluation process? Or an EDUCATIONAL TECHNOLOGY evaluation process (blackball companies that hawk expensive but ineffective products)?
Ben–evaluation is not a black and white issue that leads only to blind continuation or dismissal. Evaluation, when well implemented, is the ground floor for supportive improvement planning. Gaps identified are responsibly followed by a plan for improvement with steps identified. This could provide a guide for Professional Development, either individually or at the building or district level. Just as test scores for students should be spurring questions about the quality of the curriculum, the need for leadership and supervision of staff. Whatever data is gleaned from test scores (whether individually, in the aggregate, disaggregated or in Value Added reporting systems) should be triangulated against other kinds of data (other testing systems employed by the district, grades and coursework, etc). Certainly attention should be paid if a teacher, or a group of teachers, over time demonstrate consistent deficits, or successes.
We are not going to find hoardes of brighter and better teachers (and schools and districts) somewhere out in the universe. We are going to have to build them, and that requires taking stock of progress from time to time.