Federal Policy, Teacher Effects

by Lisa Hansel
March 5th, 2014

My objective today is to put words in a few prominent researchers’ mouths—or better yet, their paper, “Learning that Lasts: Unpacking Variation in Teachers’ Effects on Students’ Long-Term Knowledge.” Benjamin Master, Susanna Loeb, and James Wyckoff have posted online a “preliminary draft,” which no one is supposed to cite. Blogging, I assume with all online content, is fair game. This paper is terrifically important. I just want to see a draft that more fully discusses the many factors that contribute to teacher effects.

Let’s start with why this paper is worth your time: It’s a blockbuster for those worried about the negative consequences of annual high-stakes test-based teacher evaluations. Looking at the long-term impact of teachers with high value added, the researchers conclude:

Evaluation and accountability systems may incentive educators to focus excessively on short-term tested outcomes in ways that are not ultimately beneficial for students…. Collectively, this body of evidence demonstrates that teachers’ instructional practices can influence their short-term value-added performance in ways that do not correspond with long-term success for students…. Overall, our results demonstrate that teachers’ effects on students’ long-term skills can vary substantially and systematically, in ways that are not fully captured by short-term value-added measures of instructional quality.

We clearly need education policy to incentivize (or at least not impede) meaningful educational gains, so I hope policymakers will heed this research.

To increase the odds that they will heed it, the paper needs one quick little addition: more forceful acknowledgement that teachers’ effects are influenced by many factors. Several federal and state policies could be explored as means of positively influencing curriculum and instruction. This is not simply a teacher issue. It is a standards, curriculum, assessment, accountability, teacher preparation, professional development, leadership, and resource-allocation issue.

Is it really these researchers’ job to remind readers of the broader context? No. It’s just something that, given the importance of the issue, I’m hoping they’ll want to note.

Barack_Obama_through_a_magnifying_glass

When it comes to teacher effects, context matters.

Reading this paper, it’s easy to get swept up in thinking the teacher makes all the difference. For example, the more academic knowledge teachers have, the more they seem to infuse that in their instruction, to great effect:

The within-subject value-added persistence of ELA teachers who attended a more competitive undergraduate institution is significantly and substantially higher than that of teachers who attended a less competitive institution…. Differences in persistence are similarly large when comparing teachers whose SAT Verbal exam scores or LAST licensure exam scores are in the top third of the teacher distribution, in comparison to lower-scoring teachers. In both cases, higher scoring teachers show greater persistence…. It is notable that our teacher ability characteristics predict large differences in ELA teachers’ value-added persistence, even though they are not themselves correlated with teachers’ short-term value-added effects.

One might be tempted to think these direct teacher effects are simply teacher-quality issues. But nothing in education is so simple. Ask yourself: what’s likely to make a person with the potential to have lasting effects want to be a teacher for years to come? Rigorous standards and engaging curriculum, meaningful assessments that support instruction, accountability policies that don’t incentivize test prep, academically demanding preparation programs, tailored professional development, helpful leaders, etc.

Much of the paper is devoted to examining potential student- vs. teacher-level drivers of the variation in teachers’ long-term impact. That, obviously, is a key question—I just want to see more acknowledgement that the “teacher effects” are federal-, state-, district-, and school-policy effects. Here’s the heart of the research:

Observable student characteristics related to their socio-economic status or prior ability also predict substantial variation in their ELA teachers’ value-added persistence. The persistence of achievement gains coming from having an effective teacher is far lower for students who are eligible for free lunch, are black or Hispanic, or whose twice-lagged ELA achievement scores are below the mean…. These students may be receiving ELA instruction that is less focused on long-term knowledge, or they may be less skilled at acquiring or retaining long-term knowledge.

Ultimately, the researchers conclude the primary issue is likely “instruction that is less focused on long-term knowledge”:

We see evidence of the importance of instruction in the positive association between teachers’ academic ability and their contributions to students’ long-term knowledge. Even more compelling, we find that schools that serve more disadvantaged students or that hire fewer of these high-ability teachers have lower value-added persistence in ELA for all of their students. Students, regardless of their prior test performance, who attend schools with many low-performing students demonstrate lower persistence of the learning gains they achieve from having a high value-added teacher. The persistence in low-achieving schools is less than half the rate of that in other schools. These findings provide evidence that instructional quality is a key driver of the variation that we observe in value-added persistence, and that school-level curriculum or instructional norms may foster differences in instructional quality. Unfortunately, we are unable to directly observe the instructional practices of teachers or schools in our sample. However, in light of prior research on educators’ responses to high stakes accountability pressures … one plausible explanation for our findings could be that schools serving lower performing students systematically prioritize gains in short-term tested achievement in ways that detract from teachers’ focus on long-term knowledge generation.

As I’ve said, there’s a whole lot beyond “school-level curriculum or instructional norms” that “may foster differences in instructional quality.” The authors of this paper know that—and it’s certainly not their fault that many policymakers need to be reminded. But they do. And if more policymakers get the message that we have a multifaceted, highly complex problem to address, perhaps more desperately needed research dollars will be provided and more varied policies will be piloted.

What is the Value in a High Value-Added Teacher?

by Guest Blogger
January 12th, 2012

by Jessica Lahey

Great news emerged this week for elementary- and middle-school teachers who make gains in their students test scores.  While the teachers themselves may not be pulling down big salaries, their efforts result in increased earnings for their students. In a study that tracked 2.5 million students for over 20 years, researchers found that good teachers have a long-lasting positive effect on their students’ lives, including those higher salaries, lower teen-pregnancy rates, and higher college matriculation rates.

I’m a practical person.  I understand that we spend billions of dollars educating our children and that the taxpayer deserves some assurance that the money is not being squandered.  Accountability matters.  I get it.  Still, as a teacher, it’s hard not to feel a little bit wistful, perhaps even wince a little, reading this study.

It’s important to remember that its authors, Raj Chetty, John N. Freidman, and Jonah E. Rockoff, are all economists. Their study measures tangible, economic outcomes from what they call high versus low “value-added” teachers. This “value-added” approach, which is defined as “the average test-score gain for his or her students, adjusted for differences across classrooms in student characteristics such as prior scores,” may work for measuring such measurable outcomes as future earnings, but it misses so much of the point of education.

I asked my Uncle Michael, a professor of law and economics, what he thought of the study, and he compared the proponents of the study’s mathematical economic approach to education to acolytes of The Who’s Tommy, pinball wizards who “sought to isolate themselves from the world so as to improve their perception of a very narrow sliver of that world. The entire ‘assessment’ enterprise defiles education as that word once meant.”

He attempted to explain his feelings about the study in terms of mathematical equations – something to do with linear regression thinking and educational outcomes, but I got lost in the Y = a + bX + errors of it all.

Tim Ogburn, 5th grade teacher in California, phrases the debate a bit more simply: Why are we educating children?

His answer goes like this: Until fairly recently, teachers would have answered that they were educating children to become good Americans or good citizens, but now we seem to teach only to prepare elementary- and middle-school children for high paying jobs. When money figures into the goal, we lose so much along the way, such as curiosity, a love of learning for its own sake, and an awareness that many of the most worthwhile endeavors (both personally and socially) are not those with the highest monetary rewards.

To which I reply: Hear, hear. If economic gain is the measure of our success, we have lost sight our goals in education.

In order to round out the definition of “value” as defined by Chetty’s study, I conducted my own research project. Sure, my sample was smaller – about thirty versus Chetty’s 2.5 million, and the duration of my study was three days rather than 20 years…and of course there might just have been a wee bit of selection bias in my Facebook sampling. Oh, and I chose not to apply Uncle Michael’s formulas because they gave me a headache.

The goal of my study was to find out what some of the other, less measurable benefits of good teaching. I asked people to write in with examples of good teaching, teaching that has resulted in positive outcomes in their lives. Who were their “high value-added” teachers?

Sarah Pinneo, a writer from New Hampshire, recalled her third grade teacher, who took her aside one day and said, “You are going to be a writer. Here’s your portfolio. Every poem you finish, we’re going to save it in here.” Sarah’s first novel will be released on February first, and she still has that poetry portfolio.

Carol Blymire, a food writer and public relations executive in Washington, D.C, recalled her kindergarten teacher “who taught me that letters make words and words make sentences…and is the reason I love to write today.” She counts among her low value-added teachers, “Every other teacher reprimanded me for asking questions that came across as challenging them, even though it was really my way of wanting to know more and understand the bigger picture.”

My favorite example came from Dr. Jeffrey Fast, an English teacher in Massachusetts.

“One morning, when I was a senior, we were discussing Maxwell Anderson’s Winterset. While I can no longer remember exactly what I said, it was something about the interaction among the characters. Immediately after I spoke, [my teacher] responded by saying – for all to hear: ‘I like you!’ His response, of course, was coded language to identify and mark – for both me and my peers – something insightful. I felt enormously rewarded. That was the benchmark that I tried to replicate in dealing with literature ever afterwards. That was 50 years ago. He never knew that those three words catapulted me – to a Ph.D. and a career as an English teacher!”

While the studies of economists may add to the discussion about what makes teachers valuable in our lives, I believe that if we reduce teachers’ value to dollars and cents, we run the risk of becoming, in Oscar Wilde’s phrase, “the kind of people who know the price of everything, but the value of nothing.”

 

 

The MET Research Paper: Achievement of What?

by Guest Blogger
December 19th, 2010

by Diana Senechal

A new study by the Measures of Effective Teaching (MET) Project, funded by the Bill and Melinda Gates Foundation, finds that students’ perceptions of their teachers correlate with the teachers’ value-added scores; in other words, “students seem to know effective teaching when they experience it.” The correlation is stronger for mathematics than for ELA; this is one of many discrepancies between math and ELA in the study. According to the authors, “outside the early elementary grades when students are first learning to read, teachers may have limited impacts on general reading comprehension.” This peculiar observation should raise questions about curriculum, but curriculum does not come up in the report.

When the researchers combined student feedback and math value-added (from state tests) into a single score, they found that “the difference between bottom and top quartile was .21 student standard deviations, roughly equivalent to 7.49 months of schooling in a 9-month school year.” For ELA, the difference between top and bottom quartle teachers was much smaller, at .078 student-level standard deviations.

What are students learning in ELA? Beginning in fourth grade, students appear to gain just as much in reading comprehension from April to October as from October to April—that is, the summer months away from school do not seem to affect their gains. According to the researchers, “the above pattern implies that schooling itself may have little impact on standard read­ing comprehension assessments after 3rd grade.” They posit, somewhat innocently, that “literacy includes more than reading comprehension … It involves writing as well.” The lack of teacher effects applied mainly to the state tests;  when the researchers administered the written Stanford 9 Open-Ended Assessment for ELA, the teacher effects were larger than for math.

What explains the relatively low teacher effects on the ELA state tests? The researchers offer two possibilities: (a) teacher effects on reading comprehension are small after the early elementary years and (b) the tests themselves may fail to capture the teachers’ impact on literacy. Both of these hypotheses seem plausible but tangential to the central problem: this amorphous concept of “literacy.” Why should schools focus on “literacy” in the first place? Why not literature and other subjects?

A curious detail may offer a clue to the problem: the correlation between value-added on state tests and the Stanford 9 in ELA is low (0.37). That is, teachers whose students see gains on the ELA state tests are not very likely to see gains on the Stanford 9 as well.  That is, teachers whose students see gains on the ELA state tests are unlikely to see gains on the Stanford 9 as well. (The researchers do not state whether the reverse is true.) The researchers thought some of this might be due to the “change in tests in NYC this year.” When they removed NYC from the equation, the correlation was significantly higher. (But the New York math tests changed this year as well, and this apparently did not affect things—the correlation for math between the state and BAM value-added is “moderately large” at 0.54.)

Is it not possible that NYC suffers from a weak or nonexistent ELA curriculum, more so than the other districts in the study? Certainly curriculum should be considered, if an entire district shows markedly different results from the others.

In math, there usually is a curriculum. It may be strong or weak, focused or scattered, but there is actual material that students are expected to learn. In ELA, this may or may not be the case. In schools and districts with a rigorous English curriculum (as opposed to a literacy program), students read and discuss challenging literary works, study grammar and etymology, write expository essays, and  more. In the majority of New York City public schools, by contrast, this kind of concrete learning is eschewed; lessons tend to focus on a reading strategy, and students practice the strategy on their separate books. New York City has taken the strategy approach since 2003 (and in some cases much earlier); Balanced Literacy, or a version of it, is the mandated ELA program in most NYC elementary and middle schools. The MET researchers do not consider curriculum at all; they seem to assume that a curriculum exists in each of the schools and that it is consistent within a district.

In short, when analyzing teacher effects on achievement gains, the researchers forgot to ask: achievement of what? This is not a trivial question; the answers could shed light on the value-added results and their implications. It may turn out that the curricular differences are too slight or vague to make a difference, or that they do not significantly affect performance on these particular tests. Or the investigation of such differences may turn the whole study upside down. In any case, it is a mistake to ignore the question.

Diana Senechal taught for four years in the New York City public schools and holds a Ph.D. in Slavic languages and literatures from Yale. Her book, Republic of Noise: The Loss of Solitude in Schools and Culture, will be published by Rowman & Littlefield Education in late 2011.

 
 

 

 

Value-Added: When Being Right Isn’t Enough

by Robert Pondiscio
August 17th, 2010

The Los Angeles teachers union may be correct to fight the publication of individual teachers’ value-added test scores.  But they’re on the wrong side of history, writes Dan Willingham, who has long made a compelling, fair and utterly dispassionate case (he has no skin in the fight) that using value-added measures to evaluate teacher quality is “not ready for prime time.”  He’s explained the problems in blog posts, and even in a YouTube video

Clearly, to no avail.  For those who have just arrived on the planet this morning, the Los Angeles Times over the weekend produced a blockbuster piece of reporting, based on years of test scores for 3rd through 5th grade teachers, enlisting a statistician to rate the effectiveness of individual teachers by name. The data, says the paper tells “which ones have the classroom magic that makes students learn and which ones annually let their students down.”  Blogging at the Washington Post’s Answer Sheet, Willingham responds:

“The writers of the Times article are either uninformed or disingenuous about the status of the value-added measures. They write ‘Though controversial among teachers and others, the method has been increasingly embraced by education leaders and policymakers across the country, including the Obama administration.’  The ‘others’ include most researchers looking into the matter.”

The L.A. teachers union is calling for a boycott of the paper.  Good luck with that, is Willingham response.  “When it comes to value-added measures, teachers and unions are right. The models aren’t reliable enough to evaluate individual teachers,” he observes. 

“But right now that doesn’t matter much. The mood today is that something has to be done about incompetent teachers. We’ve seen that mood in districts in New York City and Washington D.C. and now we’re seeing it in Los Angeles.  We’re also seeing it at the federal level. Education Secretary Arne Duncan said that the publishing of the individual teacher’s scores is just fine. The people who feel that something must be done are right. In most districts there is not a mechanism by which to ensure that incompetent teachers are not teaching.”

This is the time for the teacher’s unions to make teacher evaluation their top priority, Willingham concludes. “If they don’t, others will.”

Others have.

Six Reasons Merit Pay is Unfair

by Robert Pondiscio
May 26th, 2009

President Obama loves merit pay.  So does Arne Duncan.  Editorial writers from coast to coast support the idea proposed by Gov. Arnold Schwarzenegger  that “teacher employment be tied to performance, not to just showing up.”  Dan Willingham wanders into the fray with his latest video, “Merit Pay, Teacher Pay and Value-Added Measures,” and offers six reasons why “value added measures sound fair, but they are not.”

The political winds certainly seem to be very much at the back of merit pay plans.  Months or years hence, there may be a temptation to describe the “unintended consequences” of such plans.  Call them unintended, but not unanticipated.

There’s No “I” In Value Added

by Robert Pondiscio
February 27th, 2009

If teachers are evaluated and rewarded on the performance of their individual students, what incentive do they have to be good team players?  Why prize the overall performance of their students and school over how kids perform in the teachers’ own class?  This essential question was brilliantly posed by Matthew Ladner at Jay Greene’s blog last week.

The impetus for the question was a New York Times magazine piece by Michael Lewis on Shane Battier of the Houston Rockets, who is “widely regarded inside the N.B.A. as, at best, a replaceable cog in a machine driven by superstars,” according to Lewis. ”And yet every team he has ever played on has acquired some magical ability to win.”

In basketball, gaudy personal statistics earn you megabucks and create incentives to pad you stats regardless of whether it helps your team win.  Battier, however, is a white space employee.  “The term refers to the space between boxes on an organizational chart,” Ladner explains. ”A white space employee is someone who does whatever it takes to achieve organizational goals and makes the organization work much better as a whole.”  What does this have to do with teaching?  Plenty. 

As we move into the era of value-added analysis for teacher merit pay, this article provides much food for thought. School leaders must consider carefully what they will reward, and give some consideration to how white space behavior is rewarded. Rewards should not just be based on individual learning gains- reaching school wide goals should also be strongly rewarded. Otherwise my incentive as a math teacher will be to assign six hours of math homework a night- and to hell with everyone else (see Iverson, Allen).

“There’s no reward for being a white space player OR a superstar in the current system of teacher compensation,” Ladner concludes. “Just an old player.”  The unintended consequences have been the undoing of many a school reform effort.  If Ladner’s right about this — and I think he is — the consequences may be unintended, but they will not have been unforeseen. 

 

Diane Ravitch on Teacher Evaluation and Value-Added

by Guest Blogger
November 18th, 2008

by Diane Ravitch

In his post, “Getting Value-Added Right,” Robert raises excellent questions, and his restaurant metaphor is apt. The value-added growth model, as Dan Willingham notes in the comments section and his post on the Britannica Blog, is not ready for prime time. There are too many intervening variables to hold teachers solely accountable for the test-score growth of every student. Given high rates of mobility, there is a large fluctuation in the student population in schools. As Thomas J. Kane and Douglas O. Staiger point out in one of their papers, their inherent volatility make test scores a poor basis for an accountability system.

The imprecision of test score measures arises from two sources. The first is sampling variation, which is a particularly striking problem in elementary schools. With the average elementary school containing only sixty-eight students per grade level, the amount of variation stemming from the idiosyncrasies of the particular sample of students being tested is often large relative to the total amount of variation observed between schools. The second arises from one-time factors that are not sensitive to the size of the sample; for example, a dog barking in the playground on the day of the test, a severe flu season, a disruptive student in a class, or favorable chemistry between a group of students and their teacher. Both small samples and other one-time factors can add considerable volatility to test score measures.

There are many, many reasons why one-year changes in scores are not reliable. There are many reasons why it is hard to give credit or blame for students’ test score gains and losses from year to year. Until we have better tests and have ironed out many of the confounding variables, it is unfair to make credible inferences about teacher performance from test scores, let alone use such data to dispense rewards and punishments.

There is another reason to worry about value-added growth models that determine a teacher’s fate and compensation. If we turn teaching into an activity whose sole purpose is to produce gains on tests that we know are mainly low-level and dumbed-down, we will not make education better. We may succeed in destroying it altogether. We better find ways to emphasize the quality of curriculum (think Core Knowledge) and to de-emphasize the number of times that kids are asked to check off a box on standardized tests in the course of a month. Or our education system will be far worse than ever.

Diane blogs on education at Bridging Differences — ed.

Getting Value-Added Right

by Robert Pondiscio
November 17th, 2008

Moving to the growth or “value-added” model of assessment, seems to be the favorite education reform of the incoming Obama administration, notes the Washington Post’s Jay Mathews, who seems to favor the idea.  ”The growth model appeals to parents because it focuses on each child,” he writes.  “It gives researchers a clearer picture of what affects student achievement and what does not…The next step would be to use the same data to see which teachers add the most value to their students each year,” he writes before noting the objections to value-added among teachers and unions.

Go ahead. Blame the teacher unions. They make no apology for their opposition to this approach. But they have good arguments. Congress will have to revise the No Child Left Behind law to install the growth model, and most support for the idea there extends only to rating schools, not teachers. Assessing instructors by how much their students improve seems reasonable to people like me who have never taken a psychometrics course, but nobody has sufficiently tested the statistical devices for doing that, and they might prove to be expensive.

I’ve never taken a psychometrics course either, but at the elementary school level, it’s the rare teacher who would be comfortable having his or her fortunes tied to value-added measures for the simple and obvious reason that there are too many variables impacting student achievement that an individual teacher cannot control, or even influence.  Try this analogy: 

Let’s say you’re a waiter working the lunch shift at a restaurant with lots of repeat business.  The owner  wants to make sure that sales per diner and customer satisfaction are going up.  That’s perfectly reasonable.  But instead of looking at the average sales and customer satisfaction, the owner wants to hold you accountable for every single diner you serve.  They all need to go up.  If even a single diner leaves unhappy and spends less, you’ve failed.  Your job is to make sure that every customer is happier today than they were with yesterday’s lunch and spends more, even if they ate at a different restaurant.  Since yesterday, the customer may have had a tough day at work, argued with his spouse, or got in an accident in the cab on the way over.  He may not even be very hungry today.  It doesn’t matter.  If you’re really good at what you do, you should be able to overcome every obstacle since studies show the most important variable in customer satisfaction is the waiter.  You have no control over the menu, the meal, the seating, the decor, or the customer’s interactions with the hostess, the bartender, the busboy and every other staff member.  By the way, if you work at Denny’s your customers are expected to be just as happy as they are at Le Cirque. 

After the appetizers are cleared – not even at the end of the meal – the customer satisfaction survey is dropped on the table.  Meawhile, at a different waiter’s table, another customer is having a terrible time.  The waiter is rude, the food is cold, and the busboy spilled water on him.  He’s filling out a survey too.  Half of his evaluation will be charged to you, since you served him lunch yesterday. 

Fair?

None of this should be taken as an attack on the idea of accountability, or even value-added.  I’m a firm believer that as teachers, we need to hold ourselves to very high standards and be accountable to the taxpayers who pay our salaries.  Accountability matters a great deal.  But poorly designed and executed accountability measures will set back the cause of accountability, perhaps irrevocably.  We’ve got to get this right, not engage in another round of ready-fire-aim.

A Novel Use of Data

by Robert Pondiscio
November 5th, 2008

San Diego’s school system is planning to use value-added data to…identify students who are most at risk of dropping out and need extra help.  Using five years of data, a detailed account of a student performance will be created.  “It’s a tool that will allow us to predict which kids are at risk for dropping out with a certain degree of accuracy,” Deputy Superintendent Chuck Morris tells the San Diego Union-Tribune. “We’ll be able to predict which students would have trouble with algebra as early as fifth or sixth grade.”

A student’s scores on state standardized tests and other assessments would be compared with other students districtwide. If a student shares some of the same performance trends as those who have encountered problems, the district would offer extra help.

It’s refreshing to hear value-added discussed in terms of its benefit to students, rather than as a cudgel.  Incidentally, California law forbids the use of student performance in teacher evaluations.

How Not to Evaluate Teachers

by Robert Pondiscio
November 3rd, 2008

UVA professor and Core Knowledge board member Dan Willingham, who routinely graces this blog with his observations, is now blogging over at Britannica Blog.  His first post is up today, and it’s a barn burner: How NOT to Evaluate Teachers.  Plans to evaluate teachers based on standardized test scores are “fatally flawed,” he writes.

Obviously, the measure cannot be based on a one-time test score, because a student’s achievement is a product of (at least) his home environment, neighborhood, and prior schooling. So you must try to assess how much the student learns over the course of the year. But these “value added” measures bring lots of thorny statistical problems. For example, suppose your plan is to administer a test in the Autumn and one in the Spring, and to compare them to see how much students have gained. Well, some Autumn test-takers will have moved by the Spring.  Can’t you just ignore those scores? No, because low-income students are more likely to move than high-income students, and low-income students tend to score lower. So if you ignore missing data, you’re biasing the estimate.

Dan lists other problems that he says are old stuff to statisticians, and concludes ”there’s nothing wrong with using value-added measures in research, with all the caveats of the method understood, as one in an array of tools to address a research question. But using it as a measure of an individual teacher’s efficacy is foolish.”