Six Reasons Merit Pay is Unfair

by Robert Pondiscio
May 26th, 2009

President Obama loves merit pay.  So does Arne Duncan.  Editorial writers from coast to coast support the idea proposed by Gov. Arnold Schwarzenegger  that “teacher employment be tied to performance, not to just showing up.”  Dan Willingham wanders into the fray with his latest video, “Merit Pay, Teacher Pay and Value-Added Measures,” and offers six reasons why “value added measures sound fair, but they are not.”

<a href="http://youtube.com/watch?v=uONqxysWEk8">http://youtube.com/watch?v=uONqxysWEk8</a>

The political winds certainly seem to be very much at the back of merit pay plans.  Months or years hence, there may be a temptation to describe the “unintended consequences” of such plans.  Call them unintended, but not unanticipated.

There’s No “I” In Value Added

by Robert Pondiscio
February 27th, 2009

If teachers are evaluated and rewarded on the performance of their individual students, what incentive do they have to be good team players?  Why prize the overall performance of their students and school over how kids perform in the teachers’ own class?  This essential question was brilliantly posed by Matthew Ladner at Jay Greene’s blog last week.

The impetus for the question was a New York Times magazine piece by Michael Lewis on Shane Battier of the Houston Rockets, who is “widely regarded inside the N.B.A. as, at best, a replaceable cog in a machine driven by superstars,” according to Lewis. ”And yet every team he has ever played on has acquired some magical ability to win.”

In basketball, gaudy personal statistics earn you megabucks and create incentives to pad you stats regardless of whether it helps your team win.  Battier, however, is a white space employee.  “The term refers to the space between boxes on an organizational chart,” Ladner explains. ”A white space employee is someone who does whatever it takes to achieve organizational goals and makes the organization work much better as a whole.”  What does this have to do with teaching?  Plenty. 

As we move into the era of value-added analysis for teacher merit pay, this article provides much food for thought. School leaders must consider carefully what they will reward, and give some consideration to how white space behavior is rewarded. Rewards should not just be based on individual learning gains- reaching school wide goals should also be strongly rewarded. Otherwise my incentive as a math teacher will be to assign six hours of math homework a night- and to hell with everyone else (see Iverson, Allen).

“There’s no reward for being a white space player OR a superstar in the current system of teacher compensation,” Ladner concludes. “Just an old player.”  The unintended consequences have been the undoing of many a school reform effort.  If Ladner’s right about this — and I think he is — the consequences may be unintended, but they will not have been unforeseen. 

 

Diane Ravitch on Teacher Evaluation and Value-Added

by Diane Ravitch
November 18th, 2008

In his post, “Getting Value-Added Right,” Robert raises excellent questions, and his restaurant metaphor is apt. The value-added growth model, as Dan Willingham notes in the comments section and his post on the Britannica Blog, is not ready for prime time. There are too many intervening variables to hold teachers solely accountable for the test-score growth of every student. Given high rates of mobility, there is a large fluctuation in the student population in schools. As Thomas J. Kane and Douglas O. Staiger point out in one of their papers, their inherent volatility make test scores a poor basis for an accountability system.

The imprecision of test score measures arises from two sources. The first is sampling variation, which is a particularly striking problem in elementary schools. With the average elementary school containing only sixty-eight students per grade level, the amount of variation stemming from the idiosyncrasies of the particular sample of students being tested is often large relative to the total amount of variation observed between schools. The second arises from one-time factors that are not sensitive to the size of the sample; for example, a dog barking in the playground on the day of the test, a severe flu season, a disruptive student in a class, or favorable chemistry between a group of students and their teacher. Both small samples and other one-time factors can add considerable volatility to test score measures.

There are many, many reasons why one-year changes in scores are not reliable. There are many reasons why it is hard to give credit or blame for students’ test score gains and losses from year to year. Until we have better tests and have ironed out many of the confounding variables, it is unfair to make credible inferences about teacher performance from test scores, let alone use such data to dispense rewards and punishments.

There is another reason to worry about value-added growth models that determine a teacher’s fate and compensation. If we turn teaching into an activity whose sole purpose is to produce gains on tests that we know are mainly low-level and dumbed-down, we will not make education better. We may succeed in destroying it altogether. We better find ways to emphasize the quality of curriculum (think Core Knowledge) and to de-emphasize the number of times that kids are asked to check off a box on standardized tests in the course of a month. Or our education system will be far worse than ever.

Diane blogs on education at Bridging Differences — ed.

Getting Value-Added Right

by Robert Pondiscio
November 17th, 2008

Moving to the growth or “value-added” model of assessment, seems to be the favorite education reform of the incoming Obama administration, notes the Washington Post’s Jay Mathews, who seems to favor the idea.  ”The growth model appeals to parents because it focuses on each child,” he writes.  “It gives researchers a clearer picture of what affects student achievement and what does not…The next step would be to use the same data to see which teachers add the most value to their students each year,” he writes before noting the objections to value-added among teachers and unions.

Go ahead. Blame the teacher unions. They make no apology for their opposition to this approach. But they have good arguments. Congress will have to revise the No Child Left Behind law to install the growth model, and most support for the idea there extends only to rating schools, not teachers. Assessing instructors by how much their students improve seems reasonable to people like me who have never taken a psychometrics course, but nobody has sufficiently tested the statistical devices for doing that, and they might prove to be expensive.

I’ve never taken a psychometrics course either, but at the elementary school level, it’s the rare teacher who would be comfortable having his or her fortunes tied to value-added measures for the simple and obvious reason that there are too many variables impacting student achievement that an individual teacher cannot control, or even influence.  Try this analogy: 

Let’s say you’re a waiter working the lunch shift at a restaurant with lots of repeat business.  The owner  wants to make sure that sales per diner and customer satisfaction are going up.  That’s perfectly reasonable.  But instead of looking at the average sales and customer satisfaction, the owner wants to hold you accountable for every single diner you serve.  They all need to go up.  If even a single diner leaves unhappy and spends less, you’ve failed.  Your job is to make sure that every customer is happier today than they were with yesterday’s lunch and spends more, even if they ate at a different restaurant.  Since yesterday, the customer may have had a tough day at work, argued with his spouse, or got in an accident in the cab on the way over.  He may not even be very hungry today.  It doesn’t matter.  If you’re really good at what you do, you should be able to overcome every obstacle since studies show the most important variable in customer satisfaction is the waiter.  You have no control over the menu, the meal, the seating, the decor, or the customer’s interactions with the hostess, the bartender, the busboy and every other staff member.  By the way, if you work at Denny’s your customers are expected to be just as happy as they are at Le Cirque. 

After the appetizers are cleared – not even at the end of the meal – the customer satisfaction survey is dropped on the table.  Meawhile, at a different waiter’s table, another customer is having a terrible time.  The waiter is rude, the food is cold, and the busboy spilled water on him.  He’s filling out a survey too.  Half of his evaluation will be charged to you, since you served him lunch yesterday. 

Fair?

None of this should be taken as an attack on the idea of accountability, or even value-added.  I’m a firm believer that as teachers, we need to hold ourselves to very high standards and be accountable to the taxpayers who pay our salaries.  Accountability matters a great deal.  But poorly designed and executed accountability measures will set back the cause of accountability, perhaps irrevocably.  We’ve got to get this right, not engage in another round of ready-fire-aim.

A Novel Use of Data

by Robert Pondiscio
November 5th, 2008

San Diego’s school system is planning to use value-added data to…identify students who are most at risk of dropping out and need extra help.  Using five years of data, a detailed account of a student performance will be created.  “It’s a tool that will allow us to predict which kids are at risk for dropping out with a certain degree of accuracy,” Deputy Superintendent Chuck Morris tells the San Diego Union-Tribune. “We’ll be able to predict which students would have trouble with algebra as early as fifth or sixth grade.”

A student’s scores on state standardized tests and other assessments would be compared with other students districtwide. If a student shares some of the same performance trends as those who have encountered problems, the district would offer extra help.

It’s refreshing to hear value-added discussed in terms of its benefit to students, rather than as a cudgel.  Incidentally, California law forbids the use of student performance in teacher evaluations.

How Not to Evaluate Teachers

by Robert Pondiscio
November 3rd, 2008

UVA professor and Core Knowledge board member Dan Willingham, who routinely graces this blog with his observations, is now blogging over at Britannica Blog.  His first post is up today, and it’s a barn burner: How NOT to Evaluate Teachers.  Plans to evaluate teachers based on standardized test scores are “fatally flawed,” he writes.

Obviously, the measure cannot be based on a one-time test score, because a student’s achievement is a product of (at least) his home environment, neighborhood, and prior schooling. So you must try to assess how much the student learns over the course of the year. But these “value added” measures bring lots of thorny statistical problems. For example, suppose your plan is to administer a test in the Autumn and one in the Spring, and to compare them to see how much students have gained. Well, some Autumn test-takers will have moved by the Spring.  Can’t you just ignore those scores? No, because low-income students are more likely to move than high-income students, and low-income students tend to score lower. So if you ignore missing data, you’re biasing the estimate.

Dan lists other problems that he says are old stuff to statisticians, and concludes ”there’s nothing wrong with using value-added measures in research, with all the caveats of the method understood, as one in an array of tools to address a research question. But using it as a measure of an individual teacher’s efficacy is foolish.”

Teacher Quality, Unintended Consequences, and the Baseball Achievement Gap

by Robert Pondiscio
October 6th, 2008

New York’s Department of Education is beginning to measure the performance of thousands of elementary and middle school teachers based on how much their students improve on annual state math and reading tests, the New York Times reported last week. A joint letter to NYC teachers from Chancellor Joel Klein and UFT President Randi Weingarten explained the data is intended to “empower teachers with information useful in our teaching. In this same vein, the letter expressly prohibits the use of that information for evaluating teachers, in both annual ratings and tenure decisions.”

When the plan first came up in February, Ed Sector’s Kevin Carey wrote a much-discussed op-ed in the New York Daily News, comparing value-added data to the pioneering work done by maverick baseball general manager Billy Beane. The subject of Michael Lewis’ 2003 book, Moneyball, Beane has often managed to keep his small-market Oakland A’s competitive with deeper-pocketed teams by rejecting conventional baseball wisdom in favor of data-driven decision-making. “By crunching numbers without prejudice, Beane discovered that certain statistics that really mattered on the field, like on-base percentage, were being hugely undervalued in the player job market,” Carey wrote. “While scouts and other executives made decisions based on personal bias and flawed perceptions, Beane kept to the statistical bottom line.” Seen through this lens, the hope and promise is that we can find equivalents to on-base percentage in teacher performance that drive student achievement.

The Moneyball comparison, however, strikes me as a potentially dangerous analogy. Here’s why: Players are to baseball teams as students — not teachers — are to schools. Teachers succeed by getting the best performances from their students. Their closest counterparts in baseball are managers and coaches. Baseball executives like Billy Beane do not use data to help ordinary players over-perform. They use data to replace underperformers with overachievers.  To run a school like Billy Beane runs the Oakland A’s would mean regularly replacing low-scoring students with high-scoring students.

That would be one way to close the achievement gap. Read the rest of this entry »

Who Is National Certification Worthy?

by Robert Pondiscio
July 1st, 2008

The National Board for Professional Teaching Standards should consider student-learning gains when deciding which teachers deserve national certification, a team of researchers says in an interesting study reported in Education Week.

Students who are taught by teachers certified by the board outperform students whose teachers lack such certification on standardized tests, according to a study released last month.  Now, researchers from Harvard, Dartmouth and the Los Angeles Unified School District “make a case for combining the current measures with newer, ‘value added’ calculations that take into account the test-score gains that students make in applicants’ classes, or at least lending more weight in the assessment process to the individual tests that link most closely to improved student achievement,” says EdWeek.

For some reason, the teacher-effectiveness debate is broken into two camps, says Thomas J. Kane, a study author and a professor of education and economics at Harvard’s graduate school of education. One side focuses on students’ achievement, and then there’s another side that focuses primarily on measures of teacher practice. We think the reasonable approach is not either, but both.

To its credit, the research was one of 22 research efforts commissioned by NBPTS to gauge the effectiveness of its process. The results are apparently non-binding on NBPTS; they’re not obligated to adopt the value-added recommendation.  But one wonders if the fact that the report is being discussed in EdWeek before it’s release isn’t tantamount to a trial balloon of sorts.