Pitchers, Teachers, and Data

by Robert Pondiscio
June 28th, 2010

Over on Twitter, my friend Stephanie Germeraad, who is nearly as passionate about sports as she is about education, suggests education ought to steal a page from baseball when it comes to teacher seniority.  Commenting on the decline of legendary closer Trevor Hoffman, she tweets a quote from Alan J. Borsuk: “Schools can learn from baseball.  Brewers wouldn’t start Hoffman just because he’s been pitching longer.”  The point is that seniority is no guarantee of quality. Fair enough.  But here’s a sobering truth:  We are far more capable of measuring the effectiveness of relief pitchers like Hoffman than classroom teachers. 

If you’re a casual baseball fan, you might know a few ”facts” about the pitchers on your favorite team:  their won-loss record, their ERA  (the number of “earned runs” allowed per nine innings), or their WHIP (walks and hits per innings pitched).  To an expert, such statistics scratch the surface at best, and may even be irrelevant.  Wins are a function of a team’s offense, for example, as much as a pitcher’s effectiveness, while ERA and WHIP are strongly influenced by the defensive ability of the other eight men on the field.  An outfielder with greater range for example, will record an out on a ball that a lesser defender lets fall for a hit.  Same pitch, same swing, different outcome.

Among baseball geeks, you often hear discussions of fielding independent pitching, or ”FIP,” a measure of the things a pitcher is directly responsible for such a strikeouts, home runs and walks.  FIP helps you understand how well a pitcher pitched, regardless of how well the team played behind him.  Data even helps teams decide what kind of pitchers are best suited to their stadiums through analysis of   “park effects.”  A fly ball pitcher (yes, they keep track of fly balls, line drives and ground balls hit off every pitcher) might prosper in a big stadium like New York’s Citi Field, but allow lots of home runs in a bandbox like Philadelphia’s Citizens Bank Park.  A pitcher who “pitches to contact” (i.e., doesn’t strike out a lot of hitters) is fine if your team’s defense is strong.  If not, you might spend more to sign pitchers who are strikeout artists.  Data even helps spot problems as they occur.  Fans of the New York Mets are concerned that all-star pitcher Johan Santana’s fastball is topping out below 90 miles an hour of late, making his changeup, a slow-speed pitch, less likely to fool hitters expecting the fastball.

To a baseball fan statistics are a revelation.  The granularity and specificity are illuminating.  You can see, if you’re so inclined, a pitcher’s FIP, ERA, strikeouts, and his strikeout-to-walk ratio.  The percentage of batted balls that were hit on the ground, in the air, or for line drives can speak volumes about a pitcher’s effectiveness.  When a player’s agent goes to negotiate his contract, he can even discuss his “Wins Above Replacement” (WAR),  a statistic that measures the total value of a player over a given season compared to an average replacement player. 

If these kinds of numbers thrill you, adding depth and nuance to your love of baseball, thank Bill James.  It is no overstatement to say that no one has had a greater impact on baseball in the last 25 years than James, who pioneered and named the field of sabermetrics, the use of detailed statistics to analyze baseball team and player performance.   James has made a career of demonstrating the factors that lead to teams scoring runs and winning games, and how the efforts of individual players contribute to wins.  Some of his insights have been legendary and have overthrown time-honored beliefs about the game–why RBIs matter less than on-base percentage, for example. Or why stolen base attempts tend to hurt a team’s offense.  Before Bill James, baseball was all batting averages, bromides and intangibles–a century of baseball men who knew what they knew based on experience, instinct and rudimentary data.

We are in the test scores, bromides and intangibles era of measuring teacher quality.  If you’re a prinicipal, wouldn’t you love to know the “school effects” of teacher performance when it came time to make hiring decisions?  Would it change your perception of merit pay if there was a classroom equivalent of FIP–the factors directly under a teacher’s control?  What if we could compensate teachers based on their replacement value compared to an average first year teacher? 

“It’s far more than win/loss/ERA/WHIP” is the clubhouse mantra,” Stephanie tweeted, defending her assertion that education can profit from baseball’s example. ”Difference is, baseball doesn’t say they therefore can’t do it,” she wrote. Not quite right.  In baseball there is data–lots of it–to measure effectiveness clearly and fairly.  Difference is ”it’s far more than test scores” is not a mantra in ed reform. 

Education awaits its Bill James.


  1. [...] This post was mentioned on Twitter by Todd Hausman, Robert Pondiscio. Robert Pondiscio said: Blog post: We can measure a pitcher's effectiveness far more accurately and fairly than a teacher's. http://bit.ly/bcGUhI #education [...]

    Pingback by Tweets that mention Pitchers, Teachers, and Data « The Core Knowledge Blog, The Core Knowledge Blog -- Topsy.com — June 28, 2010 @ 5:55 pm

  2. We need to get Nate Silver of fivethirtyeight.com to do education work instead of politics. Nate also honed his skills in sabrmetrics and baseball stats before becoming the countries best analyzer of political polling.

    Comment by Matt — June 28, 2010 @ 7:25 pm

  3. Box scores for every school day perhaps?

    Comment by Rachel — June 28, 2010 @ 9:49 pm

  4. Could Bill Sanders (VAA) be considered a step toward the Bill James of education?

    Comment by Paul Hoss — June 28, 2010 @ 10:19 pm

  5. One could also wonder if the lack of job security, and the ability to take an arm load of stats into contract talks has something to do with the salary differential between teachers and major league pitchers. And long term contracts allow aging stars to be paid handsomely for not playing.

    Comment by Rachel — June 28, 2010 @ 10:38 pm

  6. @Rachel Has there ever been a baseball general manager who made a conscious decision to sign an fading star to an absurdly large contract so he could sit on the bench and underperform? Contracts represent an agreement between two parties who bargain to reach what they perceive to be a mutually beneficial arrangement. Come to think of it, there’s another interesting lesson from baseball. When a team’s general manager signs a bad contract, the general manager is generally blamed. When a school system signs a bad contract, the school system tends to blame the teachers unions and act like they were forced to sign at gunpoint. Curious.

    Comment by Robert Pondiscio — June 28, 2010 @ 10:52 pm

  7. Harvard Prof. David Perkins did a fine job on this topic – How Seven Principles of Baseball Can Transform Education: http://www.thedailyriff.com/2010/05/growing-up-professor-david-perkins.php

    Comment by CJ — June 28, 2010 @ 11:11 pm

  8. Thanks, Robert. While I thought we were talking about moving beyond last-hired, first-fired policies, as highlighted in the J-S column, I think we all welcome a world where both educators and baseball team leaders have access to and use rich, robust data sets.

    Comment by Stephanie — June 29, 2010 @ 12:26 am

  9. @Stephanie My apologies if I conflated your take on “last-hired, first-fired” with other teacher quality issues such as merit pay, tenure etc. My point was — is — that because the data set on player performance in baseball is so rich and detailed, the results — who plays and who sits, who gets a multi-million dollar contract and who gets released — tend to be logical, defensible and fair. In teaching, not so much. If you want to argue last hired, first fired is unfair, I’m inclined to agree. If you want to decide who gets laid off on test scores alone, I don’t agree. As my baseball analogy has it, it’s more complicated than that.

    Indeed, I’d argue that looking at test scores is like looking a the number of Wins a pitcher records over the season. A great pitcher will probably be successful everywhere, but that’s the exceptional talent. A mediocre pitcher might win 15 games for a good team that scores lots of runs and plays solid defense. A good pitcher might only win 8-10 while pitching more effectively for a cellar-dweller. Consider the case of Kevin Millwood, a better than average pitcher, but not a superstar. He’s been a successful pitcher for the Braves, Phillies and Rangers. This year he signed with the Baltimore Orioles, where he started the season 0-8 while recording pitching stats that were pretty much in line with his career averages. He didn’t become a bad pitcher. He merely pitches for a bad ballclub. In baseball, we can tease out how much of the Orioles performance Millwood is responsible for and how much is due to factors he does not control. If Millwood was a 5th grade teacher in an inner city Baltimore school, would we look at his results and say fire him first?

    Comment by Robert Pondiscio — June 29, 2010 @ 12:46 am

  10. Robert — I think your point about contracts is apt. And in both baseball and education there’s a certain amount of give and take — traditional teacher contracts are heavy on job security, and relatively light on salary — and it’s a trade-off both districts and unions have agreed to.

    My guess is that unions would be happy to agree to a different balance, but a different balance requires that districts be able to offer more money.

    Low pay and little job security is not a recipe for a quality workforce.

    Comment by Rachel — June 29, 2010 @ 1:28 am

  11. Thanks for the link to David Perkins. It provides great arguements AGAINST the data-DRIVEN crowd that endangers public education by chopping up knoowledge into measurable pieces. Perkins writes:

    1. Play the whole game

    2. Make the game worth playing. Motivation and relevance are key.

    3. Work on the hard parts

    While he advocates the whole game, he clarifies that he doesn’t mean “just” the whole game.

    4. Play out of town

    In Red Sox Nation, it’s a resonant sports metaphor, but it also refers to the transfer of knowledge from one context to another.

    5. Play the hidden game

    A stats view of baseball is one of baseball’s “hidden games.” In baseball, algebra, or anything else we learn, there are richer, more layered aspects than show up on the surface…drawing “learners into the game of inquiry”.

    6. Learn from the team

    Perkins notes the importance of social learning, and he urges students to learn from teammates and from other “teams”–other students in different roles.

    7. Learn the game of learning

    Perkins suggests that teachers allow students to be in charge of their own learning by putting them in the driver’s seat and letting them take control–rather than having them sit in the passenger seat and watch their education roll by.

    Perkins cites a school that, in this era of high-stakes testing, has emphasized diagnostic testing as a tool for individual students to understand their progress and determine what to focus on next. “What particularly struck me,” Perkins writes, was that with a little help “the students, not the teachers, took stock of their own progress. The tests were framed emphatically as tools to provide information, not appraisals of worth.”

    Needless to say, the worth of teachers, as well as students and the seamless web of learning is more than just an algoryhtm spit out by a Value Added Model. If the advocates of “a culture of accountability” as opposed to its opposite, an engaging learning culture, would actually get in the game, do some teaching and learning, and respect the traditions of the game instead of manipulating stats, they would understand.

    Comment by john thompson — June 29, 2010 @ 11:07 am

  12. Exactly, John, you got it.

    Comment by CJ — June 29, 2010 @ 12:09 pm

  13. It’s also instructive to compare teacher evaluation with evaluation in any number of other professions. Very few people are satisfied with teacher evaluation as it currently stands. But I’m always a bit surprised when people who would have us tie teacher evaluation primarily to student test scores claim that just about every other professional is held accountable in this way.

    I don’t know many professionals whose evaluation or pay are tied to such a complex set of quantitative indicators. This is not to knock performance pay in principle–but surely we should be more honest about the complexity and the possible pitfalls.

    Comment by Claus — June 29, 2010 @ 12:42 pm

  14. @Claus Well, precisely. My mind cannot admit of any serious objection to using performance measures to evaluate teachers–provided those measures are rich, detailed and fair. I’m consistently surprised by how many otherwise thoughtful people will admit that test scores are not particularly good measures, then quickly add “but we’ve got to start somewhere.”

    “Very well,” I usually add, “How about we start with your job?” Why aren’t those who shout the loudest for meaningful rewards and consequences for teacher performance shouting just as vociferously for rich, meaningful metrics?

    Comment by Robert Pondiscio — June 29, 2010 @ 12:51 pm

  15. Where is the money to create such a metric? Bill James and other disciples of sabrmetrics have spun their knowledge into lucrative jobs with baseball teams. (James scored a job with the Red Sox.) Those teams use this data to exploit the market of players to win more games and make money. Perhaps the Gates foundation should start using its money to encourage the creation of meaningful metrics.

    Comment by Matt — June 29, 2010 @ 2:58 pm

  16. @Matt Ya think?

    Here’s where schools can REALLY learn from sports. Show me a pitcher who can’t get right handed hitters out, but induces lefties to hit groundballs at a rate three times above average, and I’ll show you a handsomely paid bullpen specialist. Show me a teacher whose student’s math scores are terrible, whose reading scores aren’t much better, but who has an uncanny knack for raising engagement and reading achievement among low-income, black and hispanic boys who come into his class 2-3 grade levels behind and I’ll show you…a failed teacher.

    I’m making up the example, obviously. But it strikes me as interesting that the development of baseball statistics has transformed the game, leading to vastly different ways of building a team’s roster, using relief pitchers situationally, righty/lefty platoons, defensive replacements in late innnings, etc. Data-driven instruction, meanwhile, has given us the idea that every teacher should be able to reach all students at all times in all settings and in all subjects.

    Comment by Robert Pondiscio — June 29, 2010 @ 3:09 pm

  17. Robert,

    Your comparison of baseball players’ salaries to teacher salaries rings a shade hollow for me. Personally, I’d like to see all professional athletes AND teachers paid the way tennis and golf professionals are paid, for their current performances. If a professional golfer doesn’t make the cut in a golf tournament or a tennis pro doesn’t advance past the first round in a tennis tournament they don’t get a check. Very few of these folks get appearance money. There are no long term contracts for these two sports based on how they performed last year or three years ago. It’s what you have done to earn your living this year.

    I see no problem with discerning similar objective data on teachers to determine their worth. What teacher worth a dime would shy away from the opportunity to be paid on merit, how their students performed? Only the ones who had something to fear would hesitate.

    And let’s not get into the argument about what students are placed in which teacher’s class. Random student placements are LONG OVERDUE and could easily be controlled for if administrations and unions wanted to be true egalitarians about it. No teacher should get all the problem kids because they can “handle them” or because they’re the only man at that grade level. That’s been accepted practice for too long and it’s a policy that reeks of gender bias.

    Lebron James, Peyton Manning, David Ortez, Sidney Crosby, we’re going to tear up all those long term contracts and pay you next year based on how you perform from one game/series to the next. Wouldn’t that cause a ruckus sports fans?

    Comment by Paul Hoss — June 29, 2010 @ 4:10 pm

  18. Robert: I agree with your theory that different teachers will likely be more or less suited to teach different subjects or courses. (Just as some pitchers do better with certain types of batters.)

    In my experience as a former teacher, I excelled with some groups of students and struggled with other groups of students (of course, other factors mattered as well, such as administrative support and the number of different courses I was teaching at once). The idea that a “good” teacher will succeed in all circumstances and that a “bad” teacher will similarly fail in all circumstances, is, quite frankly, a myth.

    Comment by Attorney DC — June 29, 2010 @ 4:11 pm

  19. @Paul Hoss I took great pains NOT to make this about salaries, but about performance metrics, Paul. My point was not about how much they get paid, but the depth of our knowledge about the circumstances and settings in which athletes succeed or fail, and the use of detailed metrics to understand the individual’s contribution to the teams success. We have nothing like that in education, but we have any number of policies that operate as if we do.

    Comment by Robert Pondiscio — June 29, 2010 @ 4:14 pm

  20. And let me push back on your idea, Paul, the random student placements are long overdue. Perhaps that’s exactly the WRONG approach. Here’s a real-life example. Throughout my time as a 5th grade teacher, I noticed an odd thing about my kids’ reading tests. Here in New York, we have a four-point grading scale, where 4 is above grade level, 3 is at grade level, 2 “approaching,” and 1 below grade level. Every year, I got a large number of 2s. At the end of the year, I usually had very few. A lot became 3s, happily, but a not insignificant number became 1s. Let’s assume that I was 100% responsible for their scores. For those 2s who became 3s, I’m a rock star. For the new 1s, I was a disaster.

    What if we could isolate what it was I was doing that worked so well for some kids and failed for others? Or the kinds of kids who responded well to me? Wouldn’t you want to pack my classroom with the kids I work best with? Wouldn’t it be foolish to ignore that evidence and insist on random assignment? It would be tantamount to saying to the right-handed hitter on the baseball team,” Mickey Mantle was a great switch-hitter and he hit everybody, so we know it can be done. Hit .300 from both sides of the plate or you’re through. We can’t pay you just to hit lefties.”

    Comment by Robert Pondiscio — June 29, 2010 @ 4:32 pm

  21. “My point was not about how much they get paid, but the depth of our knowledge about the circumstances and settings in which athletes succeed or fail, and the use of detailed metrics to understand an individual’s…success.”

    Exactly. David Ortez might have had a batting average of .345 last year against right handed pitchers but only hit .198 against southpaws. That would indicate to Terry Francona when to use him or when to use someone else for the designated hitter. But that’s MLB, not Kennedy Elementary School. At Kennedy Elementary every professional (teacher) must play every game (day). Unlike the manager of the Red Sox, the principal does not have the luxury of using a substitute at will.

    Detailed metrics of a classroom could point out a teacher’s strengths and weaknesses. This would permit the teacher to work on improving their weak areas. Teachers as the professionals need to be able to appropriately adapt to all children in their class. They should be effective in most/all circumstances to be of value. Isn’t this an expectation of our accepted practice of homogeneous classes (K-6)? We get some scholars and some projects, some angels and some nudges, some boys and some girls, as well as some very secure kids alongside some very needy ones. That’s what makes us quality professional educators.

    The kids who went from 2s to 1s in reading over the course of spending a year in your class could serve as a very valuable metric to improve your performance as a teacher. What exactly did Mr. Pondiscio do or fail to do to get these kids’ reading scores to drop? In education versus pro sports, we’re not out to terminate the teacher. We need to focus carefully on improving a teacher’s performance so ultimately his/her students will benefit over time. Of course, if Mr. Pondiscio fails to respond favorably, his backside could well wind up in a sling and he could be counseled out of the profession (just kidding, Robert).

    We could go on and on here.

    Comment by Paul Hoss — June 29, 2010 @ 5:46 pm

  22. @Paul Hoss. I’m kinda stunned that Paul Hoss, Mr. Individualized Instruction himself, can’t see the value of placing students with the teachers who are most likely to succeed with them, but insists instead that teachers must learn to reach every child! Your orientation is toward “improving weak areas” rather than playing to inherent strengths. Continuing with the baseball theme, right-handed pitchers tend to be more effective pitching against righties; left-handers against lefties. A right-hander may be “less effective” against a lefty hitter, but no one would suggest he should throw with his left hand and failure to do so is an “area of weakness.” It’s an accepted structural limitation that might be acceptable early in the game, but call for a pitching change in the late innings with the game on the line. One of the ideas that I’m curious about is whether there are certain structural characteristics of teachers that might make you or me more successful with various ages, genders, subjects or settings. I don’t think there’s been any serious work to define the skills and attributes at such a fine level. All we have now is homilies and hunches (Gabriel needs a strong male role model. Let’s put him in Pondiscio’s room = Disaster). The bottom line is that like Bill James with baseball, it may very well be that powerful solutions that go against conventional wisdom may be hiding in plain sight, awaiting the attention of a clever analyst who’s not biased by what we “know” to be true.

    Our orientation is insisting that every teacher be effective for every child. It seems likely that no teacher can be effective with every child. Could it be more likely that every child has a teacher who is effective for him or her?

    Comment by Robert Pondiscio — June 29, 2010 @ 6:08 pm

  23. Superficially appealing, especially since I pitched in AAA 35 years ago, but repeats the tired notion that the value of education can be statistically measured. Baseball is a business. Its success is measured in profit. To profit, the team must fill the ballpark. To fill the ballpark, the team must win. A team wins by scoring more runs than the opposition.

    None of that applies to public education. One can’t even assume education teams agree on what “winning” is. Winning for some means job security. Winning for others means avoiding conflicts. If a school is stuck with a journeyman manager who can’t tell a Cy Young Award Winner from a bush-league bench warmer, that school will be a loser.

    And don’t get me started on the reserve clause: free agency is exactly what education needs but will never get as long as bush-leaguers make the big leagues and skippers rarely get fired.

    Comment by Fred Strine — June 29, 2010 @ 6:54 pm

  24. “…but insists instead that teachers must learn to reach every child!”

    Not sure any teacher will ever reach every child but as the adult/professional, they should be equipped to deal with every child appropriately, and hopefully with a good deal of success, at least in the cognitive domain.

    Some kids will “succeed” no matter what with anyone as their teacher. It absolutely doesn’t matter. They’ve simply been beamed down to earth at seven or eight years old as all-world students/individuals. It’s absolutely amazing but true. I’ve had eight year olds with more wherewithal than half the adults I know. They could raise other children (their peers) better than half the parents of their peers. Again, amazing. There are also kids (and we’ve all had them in our classes) who, God help them, can’t get out of their own way. Sadly, it’s a challenge to think things will ever improve for some of these poor souls. But as teachers, we must be able to do the best we can for all of them in the 180 days they are under our watch. We must. We can. We will.

    As for individualizing instruction – well, you can individualize the instruction all you want but you can never sequentially arrange personality development. Students cannot first “master” tolerable, progress to amicable, and eventually graduate as lovable. The standards haven’t even been developed yet, never mind the curricula. How would we ever get a governing body to agree on any of these progressions? Perhaps the folks over at Bridging Differences could chime in with some thoughts. A historian and a lefty could probably muster up something.

    Comment by Paul Hoss — June 29, 2010 @ 6:58 pm

  25. Fred Strine is a master teacher, and I’m not fit (to mix metaphors) to carry his cleets. I would say in response, however, that the decision to measure public education by outcomes, rightly or wrongly, seems largely a settled matter. If that’s going to be the case, I would prefer that we not pretend we can come to vast conclusions aboutt teacher quality with half-vast data.

    Comment by Robert Pondiscio — June 29, 2010 @ 6:58 pm

  26. Why is it that when we talk about evaluating teachers no one ever comments about the role of the principal? Are we expecting that the role of the principal is either meaningless (and thus should not be paid squat?) or that the principal is an idiot who has no clue about how their teachers are doing (and thus should be paid squat!). Leaders in every business organization are held more accountable for their performance than their front line folks. Why aren’t principals being held in the same regard?

    Comment by Erin Johnson — June 30, 2010 @ 1:36 am

  27. Erin,

    I worked with eight principals and knew a number of others, not one of them was ever to be mistaken for the educational leader of their building.

    They were public relations gurus, plain and simple. Their job was to keep everyone “happy,” parents, students, teachers, and especially their boss, the superintendent, by not calling the central office with any problems. Many were thought of as volunteer firemen, running around putting out little blazes here and there in their building. It was believed, if everyone was “happy” that principal was doing a good job. Kids learning? That was almost never a factor.

    Only one, in 34 years, ever made an attempt to visit classrooms regularly to find out how things were going. Two of them NEVER, and I mean never, set foot in a classroom and one of them was rarely ever even seen in the building. You could find him/her down the harbor eating breakfast or working after school, after everyone had left and (s)he didn’t have to talk with anyone, but rarely during school hours was (s)he in the building. Talk about a joke.

    As for the principal’s role in evaluating teachers, well, they’re the primary reason politicians and groups like DFER want student tests linked to teacher evaluations. They have come to realize how worthless the actual evaluations were. How can a principal ever conjure up an honest evaluation of one of their teachers if they’re never in the classroom(s) or even in the building for that matter? And when many of them did get around to the contractual obligation of an evaluation it was almost always grossly embellished. They made teacher evaluations an embarrassment to the teaching profession, either through laziness or fear of litigation if some teacher felt offended and had the gumption to question it legally.

    Somehow, I have to believe the Obama/Duncan RttT final edition will expect more out of the principals’ evaluations with room for dismissal if they can’t do the job effectively and honestly.

    Comment by Paul Hoss — June 30, 2010 @ 4:28 pm

  28. Paul, I have yet to see anything out of the Obama/Duncan team that puts any accountability on principals, district officials, curricula developers, state officials or anyone else that doesn’t have a title of “teacher”.

    You might also be interested in this latest bit of information: In a study done at the US Air Force Academy, random assignments of students to math teachers demonstrated that one-year value added assessments were *detrimental* to long term learning. That is professors that “taught to the test” ended up inhibiting their students ability to learn in subsequent years. Not a good omen for value added assessments.


    Conventional wisdom holds that “higher-quality” teachers promote better educational outcomes. Since teacher quality cannot be directly observed, measures have largely been driven by data availability. At the elementary and secondary levels, scores on standardized student achievement tests are the primary measure used and have been linked to teacher bonuses and terminations (Figlio and Kenny 2007). At the postsecondary level, student evaluations of professors are widely used in faculty promotion and tenure decisions. However, teachers can influence these measures in ways that may reduce actual student learning.

    Teachers can “teach to the test.” Professors can inflate grades or reduce academic content to elevate student evaluations. Given this, how well do each of these measures correlate with the desired outcome of actual student learning?

    Studies have found mixed evidence regarding the relationship between observable teacher characteristics and student achievement at the elementary and secondary education levels.1 As an alternative method, teacher “value-added” models have been used to measure the total teacher input (observed and unobserved) to student achievement. Several studies find that a one-standard-deviation increase in teacher quality improves student test scores by roughly one-tenth of a standard deviation (Rockoff 2004; Rivkin, Hanushek, and Kain 2005; Aaronson et al. 2007; Kane, Rockoff, and Staiger 2008).

    However, recent evidence from Kane and Staiger (2008) and Jacob, Lefgren, and Sims (2010) suggests that these contemporaneous teacher effects may decay relatively quickly over time, and Rothstein (2010) finds evidence that the nonrandom place-ment of students to teachers may bias value-added estimates of teacher quality.
    Even less is known about how the quality of instruction affects student outcomes at the postsecondary level.4 Standardized achievement tests are not given at the postsecondary level, and grades are not typically a
    consistent measure of student academic achievement because of heterogeneity of assignments/exams and the mapping of those assessment tools into final grades across individual professors. Additionally, it is difficult to measure how professors affect student achievement because students generally “self-select” their course work and their professors. For example, if better students tend to select better professors, then it
    is difficult to statistically separate the teacher effects from the selection effects. As a result, the primary tool used by administrators to measure
    professor teaching quality is scores on subjective student evaluations, which are likely endogenous with respect to (expected) student grades.

    To address these various measurement and selection issues in measuring teacher quality, our study uses a unique panel data set from the United States Air Force Academy (USAFA) in which students are randomly assigned to professors over a wide variety of standardized core
    courses. The random assignment of students to professors, along with a vast amount of data on both professors and students, allows us to examine how professor quality affects student achievement free from the usual problems of self-selection. Furthermore, performance in USAFA core courses is a consistent measure of student achievement
    because faculty members teaching the same course use an identical syllabus and give the same exams during a common testing period.

    Finally, USAFA students are required to take and are randomly assigned to numerous follow-on courses in mathematics, humanities, basic sciences, and engineering. Performance in these mandatory follow-on
    courses is arguably a more persistent measurement of student learning. Thus, a distinct advantage of our data is that even if a student has a particularly poor introductory course professor, he or she still is required
    to take the follow-on related curriculum.

    These properties enable us to measure professor quality free from selection and attrition bias. We start by estimating professor quality using teacher value-added in the contemporaneous course. We then estimate value-added for subsequent classes that require the introductory course as a prerequisite and examine how these two measures covary. That is, we estimate whether high- (low-) value-added professors in the introductory
    course are high- (low-) value-added professors for student
    achievement in follow-on related curriculum. Finally, we examine how these two measures of professor value-added (contemporaneous and follow-on achievement) correlate with professor observable attributes and student evaluations of professors. These analyses give us a unique
    opportunity to compare the relationship between value-added models (currently used to measure primary and secondary teacher quality) and student evaluations (currently used to measure postsecondary teacher

    Results show that there are statistically significant and sizable differences in student achievement across introductory course professors in both contemporaneous and follow-on course achievement. However, our results indicate that professors who excel at promoting contemporaneous student achievement, on average, harm the subsequent performance of their students in more advanced classes. Academic rank, teaching experience, and terminal degree status of professors are negatively correlated with contemporaneous value-added but positively
    correlated with follow-on course value-added. Hence, students of less experienced instructors who do not possess a doctorate perform significantly better in the contemporaneous course but perform worse in the follow-on related curriculum.

    Student evaluations are positively correlated with contemporaneous professor value-added and negatively correlated with follow-on student achievement. That is, students appear to reward higher grades in the introductory course but punish professors who increase deep learning (introductory course professor value-added in follow-on courses). Since many U.S. colleges and universities use student evaluations as a measurement
    of teaching quality for academic promotion and tenure decisions, this latter finding draws into question the value and accuracy of this practice.

    These findings have broad implications for how students should be assessed and teacher quality measured. Similar to elementary and secondary school teachers, who often have advance knowledge of assessment content in high-stakes testing systems, all professors teaching a
    given course at USAFA have an advance copy of the exam before it is given. Hence, educators in both settings must choose how much time to allocate to tasks that have great value for raising current scores but may have little value for lasting knowledge. Using our various measuresof quality to rank-order professors leads to profoundly different results.

    As an illustration, the introductory calculus professor in our sample who ranks dead last in deep learning ranks sixth and seventh best in student evaluations and contemporaneous value-added, respectively. These findings
    support recent research by Barlevy and Neal (2009), who propose an incentive pay scheme that links teacher compensation to the ranks of their students within appropriately defined comparison sets and requires
    that new assessments consisting of entirely new questions be given at each testing date. The use of new questions eliminates incentives for teachers to coach students concerning the answers to specific questions
    on previous assessments.

    Comment by Erin Johnson — July 1, 2010 @ 1:03 am

  29. Erin,

    It’s always been my belief that teaching to the test, the practice of the teacher placing more emphasis on covering testing strategies as opposed to discipline content, is detrimental to students, disingenuous on the part of the teacher, and borders on felonious pedagogy.

    The practice of college faculty being evaluated primarily on student surveys is absurd. Does the Air Force Academy or any college/university honestly believe this could possibly be the best method at their disposal? Wow!!!

    Comment by Paul Hoss — July 1, 2010 @ 5:16 pm

  30. This is a very good post, and does demonstrate one of the biggest failures in education- data. It is shocking that there is an overall lack of data in the education world, data that can be used to measure teacher performance. I’ve been in the business world, and there were a slew of numbers that were thrown at me in my yearly review with my boss (and my raise was dependent on those numbers), but in education, nothing. It doesn’t have to be that way.

    Comment by A Conservative Teacher — July 5, 2010 @ 8:43 am

  31. Being a Phys Ed/Health teacher I do not have to answer to standardized test scores. However,I do see and value the importance of incorporating effective writing into my classes as a tool to help students improve in this area as I know this is a vital skill. I guess it can be compared to AL pitchers keeping their hitting skills sharp. Even though they might not be batting today (or see the need to write for a Phys Ed class), in interleague play or the World Series (or English class or on standardized test)they will need to hit.

    Comment by Kevin Fitzgerald — July 10, 2010 @ 12:00 am

  32. [...] a comment » I just read an interesting article called Pitchers, Teachers and Data by Robert Pondiscio at Core Knowledge.  It talks about how schools could learn something from the [...]

    Pingback by What Can Schools Learn from Baseball? « Interactive Data Partners Blog — July 20, 2010 @ 4:38 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

While the Core Knowledge Foundation wants to hear from readers of this blog, it reserves the right to not post comments online and to edit them for content and appropriateness.