If He’s So Important, Why Haven’t We Heard of Him?

by E. D. Hirsch, Jr.
January 9th, 2013

In Praise of Samuel Messick 1931–1998

Everyone who is anyone in the field of testing actually has heard of Samuel Messick.  The American Psychological Association has instituted a prestigious annual scientific award in his name, honouring his important work in the theory of test validity.   I want to devote this, my first-ever blog post, to one of his seminal insights about testing.   It’s arguable that his insight is critical for the future effectiveness of American education.

My logic goes this way:   Every knowledgeable teacher and policy maker knows that tests, not standards, have the greater influence on what principals and teachers do in the classroom.   My colleagues in Massachusetts—the state that has the most effective tests and standards—assure me that it’s the demanding, content-rich MCAS tests that determine what is taught in the schools.  How could it be otherwise?  The tests determine whether a student graduates or whether a school gets a high ranking.  The standards do vaguely guide the contents of the tests, but the tests are the de facto standards.

It has been and will continue to be a lively blog topic to argue the pros and cons of the new Common Core State Standards in English Language Arts.    But so far these arguments are more theological than empirical, since any number of future curricula—some good, some less so—can fulfill the requirements of the standards.   I’m sure the debates over these not-yet-existent curricula will continue; so it won’t be spoiling anyone’s fun, if I observe that these heated debates bear a resemblance to what was called in the Middle Ages the Odium Theologicum over unseen and unknown entities.   Ultimately these arguments will need to get tied down to tests.   Tests will decide the actual educational effects of the Common Core Standards.

But Samuel Messick has enunciated some key principles that will need to be heeded by everyone involved in them if our schools are to improve in quality and equity—not only in the forty-plus states that have agreed to use the common core standards—but also in those states that have not.   In all fifty states, tests will continue to determine classroom practice and hence the future effectiveness of American education.

In this post, I’ll sketch out one of Messick’s insights about test validity.   In a second post, I’ll show how ignoring those insights has had deleterious effects in the era of NCLB.  And in a third, and last on this topic, I’ll suggest policy principles to avoid ignoring the scientific acumen and practical wisdom of Samuel Messick in the era of the Common Core Standards.

 ******

Messick’s most distinctive observation shook up the testing world, and still does.  He said that it was not a sufficient validation of a test to show that it exhibits “construct validity.”    This term of art means that the test really does accurately estimate what it claims to estimate.   No, said Messick, that is a purely technical criterion.   Accurate estimates are not the only or chief function of tests in a society,   In fact, accurate estimates can have unintended negative effects.   In the world of work they can unfairly exclude people from jobs that they are well suited to perform.  In the schools “valid” tests may actually cause a decline in the achievement being tested – a paradoxical outcome that I will stress in the three blogs devoted to Messick.

Messick called this real-world attribute of tests “consequential validity.”    He proposed that test validity be conceived as a unitary quality comprising both construct validity and consequential validity—both the technical and the ethical-social dimension.   What shall it profit a test if it reaches an accurate conclusion yet injures the social goal it was trying to serve?

Many years ago I experienced the force of Messick’s observation before I knew that he was the source of it.    It was in the early 1980s, and I had published a book on the valid testing of student writing. (The Philosophy of Compsition).   At the time, Messick was the chief scientist at the Educational Testing Service, and under him a definitive study had been conducted to determine the most valid way to measure a person’s writing ability.   Actual scoring of writing samples was notoriously inconsistent, and hence unfair.   Even when graded by specially socialized groups of readers (the current system) there was a good deal of variance in the scoring.

ETS devised a test that probed writing ability less directly and far more reliably.   It consisted of a few multiple-choice items concerned with general vocabulary and editorial acumen.     This test proved to be not only far shorter and cheaper, it was also more reliable and valid.    That is, it better predicted elaborately determined expert judgment of writing ability than did the writing samples.

There was just one trouble with this newly devised test.  Used over time, student writing ability began to decline.   The most plausible explanation was that although the test had construct validity it lacked consequential validity.   It accurately predicted writing skill, but it encouraged classroom activity which diminished writing skill—a perfect illustration of Messick’s insight.

Under his intellectual influence there is now, again, an actual writing sample to be found on the verbal SAT.   The purely indirect test which dispensed with that writing sample had had the unfortunate consequence of reducing the amount of student writing assigned in the schools, and hence reducing the writing abilities of students.  A shame: the earlier test was not just more accurately predictive as an estimate, it was fairer, shorter, and cheaper.  But ETS has made the right decision to value consequential validity above accuracy and elegance.

Next time: Consequential Validity and the Era of No Child Left Behind

9 Comments »

  1. Over and over my child is told she must master a 5 paragraph essay. I find this so frustrating because it fails to really describe what she needs to master, a lucid well thought out explanation of her knowledge ona prescribed topic. Most of us don’t write in 5 paragraphs, we write one in an email for the project and much more extensively to complete a project. This problem is alive and well.

    Comment by DC Parent — January 9, 2013 @ 3:07 pm

  2. There are precisely 378 angels on the head of a pin. That is, if you agree with my definition of angels.

    Comment by Joseph Beckmann — January 9, 2013 @ 5:23 pm

  3. Noteworthy first blog topic.

    In Massachusetts, the MCAS test clearly determines what teachers (principals do NOTHING in the classroom) do in their classrooms. Not to be overlooked in this discussion is the invaluable item analysis from the state all teachers receive with their test results. This gives teachers a question by question breakdown of how their students performed on every item on the tests. Credible districts make sure their teachers spend time on this analysis to determine where students did well and especially on areas where they did not. INVALUABLE.

    @DC Parent, I have corrected the five paragraph essay from fourth graders through the teacher test. To me, it’s a bit more troubling that nine year olds are given essentially the same task to perform as are prospective teachers. Ironically enough, this exercise in “foreshadowing” appears to work for the kaleidoscope of target audiences. It’s easy to teach, simple to grade (with validity), and for a writing simple, relatively inexpensive. For or against it, I don’t see it disappearing anytime soon, at least not here in Massachusetts.

    Comment by Paul Hoss — January 10, 2013 @ 8:39 am

  4. That item analysis, Paul Hoss cites, is, in truth, quite excellent. There are loads of data, however, that underscore “feedback delayed is feedback denied,” and the timing of that item analysis tends to be the NEXT YEAR! The real problem with MCAS – and with most tests – is the issue of delay, particularly for a technology that could provide almost immediate reinforcement, suggestions, and worthwhile instruction. And that delay reflects the corporate dominance of testing, the failures to link tests to learning, and the almost pervasive pattern of cheating (http://takingnote.learningmatters.tv/?p=6070).

    Comment by Joseph Beckmann — January 10, 2013 @ 10:12 am

  5. @Joseph, You’re correct about the delayed feedback. However, with MCAS being given in April the year is eighty percent over. Even if the state got the results back to schools in May, there would be relatively insufficient time to reteach or cover the areas where most of the errors were found.

    If the state created a computer-based MCAS with instant feedback, that could be one solution but the cost might prove to be prohibitive.

    As it stands now, teachers take the information from the item analysis and incorporate it into their teaching for the following school year. While this strategy doesn’t do the exiting class much good, it does improve future instruction.

    One solution could be to share the analysis with next year’s teachers and have them address the problem areas. Some might call this passing the buck but I’d like to think of it as efficient (and necessary) remediation.

    Comment by Paul Hoss — January 10, 2013 @ 10:30 am

  6. You’re missing my point: the test is not intended to improve education, but, rather, to measure outcomes and reward high scores. That intent is … simply … wrong, and as long as the corporate drivers are in charge the “if” you cite – a timely response and a critical use of new information – won’t happen.

    Years ago I worked with a teacher who regularly cited, “If your mom hadn’t pooped we’d never have found you,” and “if” is the worst condition of any system. Of course there are good solutions, but they’re not coming from the direction of the test makers, the test re-ifiers, and the test-investors.

    Comment by Joseph Beckmann — January 10, 2013 @ 11:58 am

  7. [...] Posts The Work of a Great Test Scientist Helps Explain the Failure of No Child Left Behind If He’s So Important, Why Haven’t We Heard of Him? A Backward Glance O’er Travel’d Roads Miss Lahey’s Epistle to the Romans How to Get a Big [...]

    Pingback by The Work of a Great Test Scientist Helps Explain the Failure of No Child Left Behind « The Core Knowledge Blog — January 10, 2013 @ 4:57 pm

  8. [...] of different posts in the fire right now, but after starting a mammoth comment on this brand new E. D. Hirsch post (Welcome to blogging, sir!), I decided to convert it to a post—after all, I need the [...]

    Pingback by SAT Writing Tests–A Brief History « educationrealist — January 13, 2013 @ 1:14 am

  9. [...] the Tests The Work of a Great Test Scientist Helps Explain the Failure of No Child Left Behind If He’s So Important, Why Haven’t We Heard of Him? A Backward Glance O’er Travel’d Roads Miss Lahey’s Epistle to the [...]

    Pingback by Blame the Tests « The Core Knowledge Blog — January 15, 2013 @ 10:25 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

While the Core Knowledge Foundation wants to hear from readers of this blog, it reserves the right to not post comments online and to edit them for content and appropriateness.