The Common Core Tests in Language Arts Will Soon Be Coming to Your Child’s School. Tell Your Local Superintendent: “Don’t Worry. Students Will Ace Those Tests If They Learn History, Civics, Literature, Science, and the Fine Arts.”

by E. D. Hirsch, Jr.
August 5th, 2013

This post originally appeared on the Huffington Post on July 30, 2013.

 

“A giant risk, as I see it, in the implementation of Common Core is that it will spawn skills-centric curricula. Indeed, every Common Core ‘expert’ we hear from seems to be advocating this approach.” 

This comment from an able and experienced teacher is one among several similar ones that teachers have recently posted on the Core Knowledge blog. Their worry is also mine.

The success of the Common Core Standards in Language Arts, adopted by more than 40 states, is supremely important for many reasons, not least because of the recent intensification of income inequality. Student scores on language arts tests are the single most reliable academic predictors of later income. The new language arts standards of the Common Core represent an historic opportunity for beneficial change in American schools—if they are put into effect intelligently.

But if you look at the data in Amazon books, you will see that the bestselling books about the Common Core are “skills-centric” ones that claim to prepare teachers for the new language arts standards by advocating techniques for “close reading” and for mastering “text complexity” as though such skills were the main ones for understanding a text no matter how unfamiliar a student might be with the topic of the text. The fact is, though, that students’ ability to engage in “close reading” and to manage “text complexity” is highly dependent on their degree of familiarity with the topic of the text. And the average likelihood of their possessing the requisite degree of familiarity with the various topics they encounter in life or on tests will depend upon the breadth of their knowledge. No amount of practice exercises (which takes time away from knowledge-gaining) will foster wide knowledge. If students know a lot they’ll easily learn to be skilled in reading and writing. But if they know little they will perform poorly on language tests—and in life.

We need to learn from recent painful experience. The failure of No Child Left Behind in fostering advanced language ability can be traced to the skills-centric test-prepping that left little room for the systematic gaining of knowledge. Of course there is one facet of the skills-centric approach to reading that should be applauded, and which did improve under NCLB—the teaching of decoding. Learning to translate those symbols on the page into sounds and words is a skill that ought to be taught systematically between kindergarten and second grade, beginning with simple letter-sound correspondences and progressing step-by-step to complicated Greek-based spellings. More systematic instruction in phonics explains why test scores went up in the earliest grades under NCLB. But its neglect of knowledge building explains why student scores did not go up in later grades when tests emphasize comprehension.

Test anxiety was paradoxically the main reason that schools spent so much time on abstract skills like “comprehension strategies” and “inferencing.” My aim in this blog post addressed to parents is to explain why the best test prep for their child under the new Common Core standards will be a more systematic approach to imparting knowledge. My argument is simple: If understanding a text depends on some prior familiarity with the topic, then that will also be true of the passages on a language-arts test.

The more a student knows the better he or she will perform on any language-arts test—whether or not that test is said to be “aligned” with the new Common Core standards. If we take a step back from the details of the Common Core standards we can see why this claim necessarily must be true. Older language arts tests—such as Gates-Macginitie, the Iowa Test of Basic Skills, the Stanford 9, the Degrees of Reading Power, the National Assessment of Educational Progress, and the verbal sections of the Armed Forces Qualification Test—correlate well with each other, indicating that they are all accurately probing underlying competence in language. If the results of the Common Core tests are not strongly correlated with these well-validated ones, then the technical validity of the new tests would rightly be deemed unsatisfactory, and no state ought to adopt them. There is no reason to think that the top experts making the new Common Core tests are not well aware of this technical issue of correlation. It’s the schools, rather, that need to reconsider what will really prepare their students for the new tests—and a productive life.

One reason that the schools have been applying a skills-centric approach is that they have regarded reading as a uniform skill that develops in stages, rather than a highly variable skill which depends on a person’s topic knowledge. The schools cannot be blamed for this. The stage-by-stage conception of reading is the theory that even top experts held up to a few decades ago. The notion that any text on any topic at the right level would enhance reading ability has encouraged our schools to tolerate a topic-incoherent curriculum in language arts. This indifference to knowledge building is the chief reason the verbal scores of our school leavers have stayed flat and low.

Cognitive scientists have found, however, that a student’s average level of reading skill, which is reasonably accurately indicated by the standard tests, masks wide fluctuations depending on the test taker’s familiarity with the topic. That’s why reading tests typically use multiple passages on different topics—characteristically about ten—to try to capture that average. And even then, the passages are not random but have been filtered through the net of grade-level criteria like word rarity and sentence length. The whole system has conspired to make schools think that the topic knowledge is less important than “reading level.” But now we know that the topic of the passage is far more important than the level. The more students know about a topic, the further above their level they can read on that topic. This new understanding of reading ability demands nothing less than a revolution in language arts instruction, with less emphasis on technique and more emphasis on the systematic acquisition of knowledge.

The new Common Core standards have recognized this research finding. They state that these standards “do not—indeed, cannot—enumerate all or even most of the content that students should learn. The Standards must therefore be complemented by a well-developed, content-rich curriculum.” And they add: “Students can only gain this foundation when the curriculum is intentionally and coherently structured to develop rich content knowledge within and across grades.” Well said! And we have to take care that the schools and the experts hear and act on that truth. Parents and concerned citizens should make sure that they do.

If any school wants to see a model for what this means in actual practice, there are a couple of resources on the Core Knowledge Foundation’s website that should be useful. Here is a grade-by-grade content sequence for all subjects in preschool through eighth grade that is downloadable for free. This sequence takes into account the knowledge that is most needed and used in written language in the United States—imparting topic familiarity as well as deeper insight across the topics that are most enabling for written communication. When schools use this sequence to write a rigorous curriculum, their students do well on language tests. Second, here is an early reading program—preschool through third grade—that systematically brings many history, science, and literary topics into the language arts classroom in sufficient depth so that the student becomes familiar with them. This pre-k – 3 program will be downloadable for free soon. For now, here’s the list of topics, with each taking about 2-3 weeks to teach. Each has about 10-12 teacher read-alouds and related class discussions and extension activities.

The coming of the Common Core standards and tests need not be a new, harrowing imposition on already besieged schools. Rather they are an historic opportunity—a new slate on which schools can write either a topic-indifferent, fragmented curriculum similar to what has failed before, or a new, exciting and successful orientation to knowledge. That’s what the top experts – the cognitive scientists—are telling us, and it’s a message that all parents, educators, and concerned citizens need to act upon.

Why Reading Is the Tougher Nut

by Lisa Hansel
June 3rd, 2013

In the New York Times last week, Brett Peiser, chief executive officer of the Uncommon Schools network, asked some very important questions about why students struggle with reading comprehension: “Is it a vocabulary issue? A background knowledge issue? A sentence length issue? How dense is the text?” As Peiser noted, all of these are facets of comprehension that take time to improve—but the problem is both easier to understand and harder to address than most educators realize.

In response to the Times article, Mike Goldstein, founder of the Match Charter School, noted on his blog that “At Match, we too have always had larger math gains than English gains.  Not just on MCAS, also on SAT.” He continues:

Two years ago I tackled this topic, and Tom [Hoffman] commented:

One problem with our testing regime is that it makes math look like half of the goal of a school. Math as a discipline is constructed and taught differently than all others. If your school is constructed to optimize math instruction, you’re not half way there; you’re more like 1/5th of the way.

Good thought.  I don’t know of any schools where kids have, say, 4 English classes and one math class.  And if we didn’t have the phrase “English class” — if instead we only had the component parts called “Literature class,” “Non-fiction class,” “Writing class,” “Vocabulary Building and Grammar class,” and so forth, you could imagine a school designed that way, with 4X the amount of time devoted to English.  And then instead of one test for English, we’d have 4 exams and one math exams.

I’m not saying this is a good idea given other tradeoffs, I’m saying that might result in more equal sized gains.

Goldstein is a fan of E. D. Hirsch’s work, so I hope he’ll smile when he reads this: I know of schools that have “4 English classes and one math class.” Schools that teach the full Core Knowledge Sequence teach language arts, world and American history and geography, visual arts, music, science, and mathematics. If you trust the decades of research showing that the key to literacy is broad knowledge, then these schools have, in effect, five English classes and one math class. The difference between this and typical schooling is that Core Knowledge schools teach all of these things from kindergarten through eighth grade. They do it rigorously, with content-rich curricula that flesh out the Sequence with domain-based studies in which reading, discussing, and writing content happens throughout each day.

To put this back in Peiser’s terms, if you focus on knowledge, issues with vocabulary, long sentences, and dense texts will be resolved.

As E. D. Hirsch has explained, vocabulary is best learned not with vocabulary lists, but in context while studying specific domains of knowledge. Key terms may be explained as needed, but most vocabulary is acquired through multiple exposures in multiple contexts. I saw an example of this recently at P.S. 104, the Bays Water School in Queens, NY, which is doing a terrific job of bringing Core Knowledge Language Arts to life. Second graders had recently listened to, discussed, and written about a series of texts their teacher read aloud about leaders—like Susan B. Anthony and Martin Luther King, Jr.—who had fought for a cause. Along the way, they learned the word “injustice.” I happened to be sitting in when the teacher started reading aloud from Charlotte’s Web: “Fern was up at daylight, trying to rid the world of injustice. As a result, she now has a pig.” Hands flew into the air as the children were eager to relate what they already knew about injustice to this new example.

It may be fairly obvious that building knowledge also builds vocabulary and enables comprehension. But the connection between knowledge and comprehending dense texts with long sentences may not be as obvious. Recent research by Diana AryaElfrieda Hiebert, and P. David Pearson shines some light:

The present study was designed to address the question of whether lexical or syntactic factors exert greater influence on the comprehension of elementary science texts. Based on previous research on text accessibility, it was expected that syntactic and lexical complexity would each affect students’ performance on science texts, and that these two types of text complexity together would additionally impact student performance. In order to test this hypothesis, 16 texts that varied in syntactic and lexical complexity across four different topics were constructed. Students read texts that ranged in complexity, each from a different topic.

Contrary to our hypotheses, syntactic complexity did not explain variance in performance across any of the four topics….

Lexical complexity significantly influenced comprehension performance for texts on two of the four topics, Tree Frogs and Soil, but not for texts on Jelly Beans and Toothpaste. This finding was consistent across all participant groups, including ELLs.

In writing about this research, E. D. Hirsch noted that “These results are at odds with the notion that the usual measures of sentence structure (and/or length) and vocabulary are reliable ways to determine the ‘right’ reading level of a text for a child. On the other hand, their findings are consistent with other work in language study, as Arya, Hiebert, and Pearson were quick to point out…. Given enough familiarity with a topic, children are able to make correct guesses about words they have never seen before. They are also able to disentangle complex syntax if their topic familiarity enables them to grasp the gist of a text.”

So, for comprehension, the primary question is this: How can we ensure that students are familiar with a broad range of topics? You don’t need four English classes—you need a content-rich, carefully organized, grade-by-grade curriculum. Introduce young children to the world through literature, science, history, geography, and the arts. Even in middle school, read to them texts that are too challenging for them to read themselves. Engage them in discussions and substantive writing every day.

And be patient.

Patience is the hard part. Our high-stakes accountability systems not only expect yearly gains, they reward bad practices. While drilling students in comprehension strategies can get a bump in scores, it will not lead to meaningful increases in literacy. Building broad knowledge (and thus broad vocabulary and the capacity to grasp complex text) takes time. With a content-rich, sequential curriculum, schools can build the necessary knowledge over time. What they can’t do is show big yearly gains on tests that are not matched to their curriculum (which is a hazard of all state tests in which the content of the passages is not revealed and thus can’t be studied).

Mike Petrilli of the Fordham Institute has proposed an interesting solution: a two-part accountability system that allows schools to create their own measures. Here’s his pitch (Duncan, Congress—are you listening?):

    1. First, as the default system, we keep something like we have today, but with better standards and tests. (Yes, common-core standards and tests.) Students are tested annually; schools are held accountable for making solid progress from September to June, with greater progress expected for students who are further behind. States and districts give these schools lots of assistance—with curriculum development, teacher training, and the like. Such a default system won’t lead to widespread excellence, but it will continue to raise the floor so that the “typical” school in America becomes better than it is today. (NB: I’d scrap any state-prescribed “accountability” below the level of the school. In other words, no more rigid teacher evaluation systems; leave personnel issues to the principals.) And it would provide taxpayers an assurance that they are getting a “public good” from their investment in public education (namely, a reasonably educated citizenry).
    2. Then we offer all public schools—district and charter—an opt-out alternative. They can propose to the state or its surrogate that they be held accountable to a different set of measures. My preferences would be those related to the long-term success of their graduates. School “inspections” could be part of the picture, too. These evaluation metrics would be rigorous, but designed to be supportive of, rather than oppositional to, the cause of excellent schools. And they might be particularly important to educators of a more progressive, anti-testing bent.

This plan would allow schools to focus on building knowledge instead of artificially boosting scores with drills on comprehension and test-taking strategies. Schools could commit to a strong curriculum, then measure progress in reading comprehension through curriculum-based tests of students’ growing knowledge of literature, science, history, geography, music, the arts. Such tests could involve reading, writing, and speaking to ensure that students are progressing in all aspects of language as they develop broad knowledge of the world.

Developing and teaching a content-rich, coherent curriculum is hard to do, but it’s also the only approach that works.

 

Happy 85th Birthday E. D. Hirsch, Part 4: Passing the Test

by Lisa Hansel
March 22nd, 2013

So far this week E. D. Hirsch has taught us that higher-order thinking depends on knowledge, that highly mobile students suffer acutely from our national refusal to establish a core of common content, and that there is an identifiable body of specific knowledge that facilitates communication. Now, on Hirsch’s birthday, we examine his game-changing policy prescription: curriculum-based reading tests.

Turning to pages 153 – 162 of Hirsch’s most recent book, The Making of Americans: Democracy and Our Schools, we learn “How to Ace a Reading Test.”*

Reading tests are attacked for cultural bias and other faults, but such complaints are unfounded. The tests are fast and accurate indexes of real-world reading ability. They correlate extremely well with one another and with actual capacity to learn and communicate. They consist, after all, of written passages, which students are to read and then answer questions on; that is, students are asked to exercise the very skill at issue…. The much more reasonable complaint is that an emphasis on testing has caused schools to devote too much time to drills and test preparation, with a consequent narrowing of the curriculum….

Yet the fault lies not with the tests but with the school administrators who have been persuaded that it is possible to drill for a reading test—on the mistaken assumption that reading is a skill like typing and that once you know the right techniques you can read any text addressed to a general audience. The bulk of time in early language-arts program today is spent practicing these abstract strategies on an incoherent array of uninformative fictions. The opportunity costs have been enormous. Schools are wasting hours upon hours practicing drills that are supposed to improve reading but that are actually depriving students of knowledge that could enhance their reading comprehension….

Here is the beginning of an actual passage from a New York State reading test for fourth grade:

There is a path that starts in Maine and ends in Georgia, 2,167 miles later. This path is called the Appalachian Trail. If you want, you can walk the whole way, although only some people who try to do this actually make it, because it is so far, and they get tired. The idea for the trail came from a man named Benton MacKaye. In 1921 he wrote an article about how people needed a nearby place where they could enjoy nature and take a break from work. He thought the Appalachian Mountains would be perfect for this.

The passage goes on for a while, and then come the questions. The first question, as usual, concerns the main idea:

This article is mostly about

1. how the Appalachian Trail came to exist.

2. when people can visit the Appalachian Trail.

3. who hikes the most on the Appalachian Trail.

4. why people work together on the Appalachian Trail.

Many educators see this question as probing the general skill of “finding the main idea.” It does not. Try to put yourself in the position of a disadvantaged fourth grader who knows nothing of hiking, does not know the difference between an Appalachian-type mountain and a Himalayan-type mountain, does not know where Maine and Georgia are, and does not grasp what it means to “enjoy nature.” Such a child, though much trained in comprehension strategies, might answer the question incorrectly. The student’s more advantaged counterpart, not innately smarter, just happens to be familiar with hiking in the Appalachians, has been to Maine and Georgia, and has had a lot of experience “enjoying nature.” The second student easily answers the various questions correctly. But not because he or she practiced comprehension strategies; this student has the background knowledge to comprehend what the passage is saying….

It has been shown decisively that subject-matter knowledge trumps formal skill in reading and that proficiency in one reading-comprehension task does not necessarily predict skill in another. Test makers implicitly acknowledge this by offering, in a typical reading test, as many as ten passages on varied topics. (If reading were a knowledge-independent skill, a single passage would suffice.)… Contrary to appearances and educators’ beliefs, these reading tests do not test comprehension strategies. There usually are questions like “What is the main idea of this passage?” but such a question probes ad hoc comprehension, not some general technique of finding the main idea. Reading comprehension is not a universal, repeatable skill like sounding out words or throwing a ball through a hoop. “Reading skill” is rather an overgeneralized abstraction that obscures what reading really is: an array of separate, content-constituted skills such as the ability to read about the Appalachian Mountains or the ability to read about the Civil War….

A reading test is inherently a knowledge test. Scoring well requires familiarity with the subjects of the test passages. Hence the tests are unfair to students who, through no fault of their own, have little general knowledge. Their homes have not provided it, and neither have the schools. This difference in knowledge, not any difference in ability, is the fundamental reason for the reading gap between white and minority students. We go to school for many years partly because it takes so long to build up the vast general knowledge and vocabulary we need to become mature readers.

Because this knowledge-gaining process is slow and cumulative, the type of general reading test now in use could be fair to all groups only above fifth or sixth grade, and only after good, coherent, content-based schooling in the previous grades. I therefore propose a policy change that would at one stroke raise reading scores and narrow the fairness gap. (As a side benefit, it would induce elementary schools to impart the general knowledge children need.) Let us institute curriculum-based reading tests in first, second, third, and fourth grades—that is to say, reading tests containing passages based on knowledge that children will have received directly from their schooling. In the early grades, when children are still gaining this knowledge slowly and in piecemeal fashion, it is impossible to give a fair test of any other sort….

We now have an answer to our question of how to enable all children to ace a reading test. We need to impart systematically—starting in the very earliest grades by reading aloud to students, then later in sequenced self-reading—the general knowledge that is taken for granted in writing addressed to a broad audience. If reading tests in early grades are based on a universe of pre-announced topics, general knowledge will assuredly be built up. By later grades, when the reading tests become the standard non-curriculum one, such as the NAEP tests, reading prowess will have risen dramatically.

Policy makers say they want to raise reading scores and narrow the fairness gap. But it seems doubtful that any state can now resist the anti-curriculum outcry that would result from actually putting curriculum-based testing into effect. Nonetheless, any state or district that courageously instituted knowledge- and curriculum-based early reading tests would see a very significant rise in students’ reading scores in later grades.

States would also see impressive results right away on the curriculum-based tests since the passages would be about content that all students had actually been taught. Just imagine: With curriculum-based tests, “test prep” would consist of studying literature, history, science, and the arts. Bringing that imaginary world to life relies on our leaders working together. So, this birthday retrospective ends with a call to the left and right, drawn from pages 186 – 187 of the Making of Americans.

One of the gravest disappointments I have felt in the twenty-fine years that I have been actively engaged in educational reform is the frustration of being warmly welcomed by conservatives but shunned by fellow liberals. The connection of the anti-curriculum movement with the Democratic Party is an accident of history, not a logical necessity. All the logic runs the other way. A dominant liberal aim is social justice, and a definite core curriculum in early grades is necessary to achieve it. Why should conservatives alone favor solid content while my fellow liberals buy into the rhetoric of the anti-curriculum theology that works against the liberal aims of community and equality? Practical improvement of our public education will require intellectual clarity and a depolarization of the issue. Left and right must get together on the principle of common content.

 

* For the endnotes, please refer to the book.

 

Do you have a birthday message for E. D. Hirsch or favorite quote from him? Please share it with all of us in the comments.

 

You may also be interested in other posts in this birthday retrospective:

Part 1: The Secret to Lifelong Learning

Part 2: Avoidable Injustice

Part 3: Breaking Free from the Siren Song

 

Blame the Tests

by E. D. Hirsch, Jr.
January 15th, 2013

In Praise of Samuel Messick 1931–1998, Part III

The chief practical impact of NCLB has been its principle of accountability. Adequate yearly progress, the law stated, must be determined by test scores in reading and math—not just for the school as a whole, but for key groups of students.

Now, a decade later, the result of the law, as many have complained, has been a narrowing of the school curriculum. In far too many schools,  the arts and humanities, and even science and civics, have been neglected—sacrificed on the altar of tests  without any substantial progress nationwide on the tests themselves. It is hard to decide whether to call NCLB a disaster or a catastrophe.

But I disagree with those who blame this failure on the accountability principle of NCLB. The law did not specify what tests in reading and math the schools were to use. If the states had responded with valid tests—defined by Messick as tests that are both accurate and have a productive effect on practice—the past decade would have seen much more progress.

Since NCLB, NAEP’s long-term trend assessment shows substantial increases in reading among the lowest-performing 9-year-olds—but nothing comparable in later grades. It also shows moderate increases in math among 9- and 13-year-olds.

So, it seems that a chief educational defect of the NCLB era lay in the later-grades reading tests; they simply do not have the same educational validity of the tests in early grades reading and in early- and middle-grades math.

 ****

It’s not very hard to make a verbal test that predicts how well a person will be able to read. One accurate method used by the military is the two-part verbal section of the multiple-choice Armed Forces Qualification Test (AFQT), which is known for its success in accurately predicting real-world competence. One section of the AFQT Verbal consists of 15 items based on short paragraphs on different subjects and in different styles to be completed in 13 minutes.  The other section of the AFQT Verbal is a vocabulary test with 35 items to be completed in 11 minutes. This 24-minute test predicts as well as any verbal test the range of your verbal abilities, your probable job competence and your future income level. It is a short, cheap and technically valid test. Some version of it could even serve as a school-leaving test.

Educators would certainly protest if that were done—if only because such a test would give very little guidance for classroom practice or curriculum. And this is the nub of the defects in the reading tests used during the era of NCLB: They did not adequately support curriculum and classroom practice. The tests in early-grades reading and in early- and middle-grades math did a better job of inducing productive classroom practice, and their results show it.

Early-grades reading tests, as Joseph Torgesen and his colleagues showed, probe chiefly phonics and fluency, not comprehension. Schools are now aware that students will be tested on phonics and fluency in early grades. In fact, these crucial early reading skills are among the few topics for which recent (pre-Common Core) state standards had begun to be highly specific. These more successful early reading tests were thus different from later ones in a critical respect:  They actually tested what students were supposed to be taught.

Hence in early reading, to its credit, NCLB induced a much greater correlation than before between standards, curriculum, teaching and tests. The tests became more valid in practice because they induced teachers to teach to a test based on a highly specific subject matter—phonics and fluency. Educators and policymakers recognized that teaching swift decoding was essential in the early grades, tests assessed swift decoding, and—mirabile dictu—there was an uptick in scores on those tests.

Since the improvements were impressive, let’s take a look at what has happened in over the past decade among the lowest performing 9-year-olds on NAEP’s long-term trend assessment in reading.

Note that there is little to no growth among higher-performing 9-year-olds, presumably because they had already mastered phonics and fluency.

Similarly, early- and middle-grades math tests probed substantive grade-by-grade math knowledge, as the state standards had become ever more specific in math. You can see where I’m going: Early reading and math improved because teachers typically teach to the tests (especially under NCLB-type accountability pressures), and the subject matter of these tests began to be more and more defined and predictable, causing a collaboration and reinforcement between tests and classroom practice.

In later-grades reading tests, where we have failed to improve, the tests have not been based on any clear, specific subject matter, so it has been impossible to teach to the tests in a productive way. (The lack of alignment between math course taking and the NAEP math assessment for 17-year-olds is similarly problematic.) Of course, there are many reasons why achievement might not rise. But specific subject matter, both taught and tested, is a necessary—if not sufficient—condition for test scores to rise.

In the absence of any specific subject matter for language arts, teachers, textbook makers, and test makers have conceived of reading comprehension as a strategy rather than as a side effect of broad knowledge. This inadequate strategy approach to language arts is reflected in the tests themselves. I have read many of them.  An inevitable question is something like this: “The main idea of this passage is….” And the theory behind such a question is that what is being tested is the ability of the student to strategize the meaning by “questioning the author” and performing other puzzle-solving techniques to get the right answer. But, as readers of this blog know, that is not what is being tested. The subject matter of the passage is.

This mistaken strategy-focused structure has made these tests not only valueless educationally, but worse—positively harmful. Such tests send out the misleading message that reading comprehension is chiefly strategizing. That idea has dominated language arts instruction in the past decade, which means that a great deal of time has been misspent on fruitless test-taking activities. Tragically, that time could have been spent on science, humanities and the arts—subjects that would have actually increased reading abilities (and been far more interesting).

The only way that later-grades reading tests can be made educationally valid is by adopting the more successful structure followed in early reading and math. An educationally valid test must be based on the specific substance that is taught at the grade level being tested (possibly with some sampling of specifics from previous and later grades for remediation and acceleration purposes). Testing what has been taught is the only way to foster collaboration and reinforcement between tests and classroom practice. An educationally valid reading test requires a specific curriculum—a subject of further conversations, no doubt.

The Work of a Great Test Scientist Helps Explain the Failure of No Child Left Behind

by E. D. Hirsch, Jr.
January 10th, 2013

In Praise of Samuel Messick 1931–1998, Part II

In a prior post I described Messick’s unified theory of test validity, which judged a test not to be valid if its practical effects were null or deleterious. His epoch-making insight was that the validity of a test must be judged both internally for accuracy and externally for ethical and social effects. That combined judgment, he argued, is the only proper and adequate way of grading a test.

In the era of the No Child Left Behind law (2001), the looming specter of tests has been the chief determiner of classroom practice. This led me to the following chain of inferences: Since 2001, tests have been the chief determiners of educational practices. But these tests have failed to induce practices that have worked. Hence, according to the Messick principle, the tests that we have been using must not be valid. Might it be that a new, more Messick-infused approach to testing would yield far better results?

First, some details about the failure of NCLB. Despite its name and admirable impulses it has continued to leave many children behind:

 

NCLB has also failed to raise verbal scores. The average verbal level of school leavers stood at 288 when the law went into effect, dropped to 283 in 2004, and stood at 286 in 2008.

Yet this graph shows an interesting exception to this pattern of failure, and it will prove to be highly informative under Messick’s principle. Among 4th graders (age 9) the test-regimen of NCLB did have a positive impact.

Moreover, NCLB also had positive effects in math:

This contrast between the NCLB effects in math and reading is even more striking if we look at the SAT, where the test takers are trying their best:

So let’s recap the argument. Under NCLB, testing in both math and reading has guided school practices. Those practices were more successful in math and in early reading than in later reading. According to the Messick principle, therefore, reading tests after grade 4 had deleterious effects and cannot have been valid tests. How can we make these reading tests more valid?

A good answer to that question will help determine the future progress of American education. Tune in.

If He’s So Important, Why Haven’t We Heard of Him?

by E. D. Hirsch, Jr.
January 9th, 2013

In Praise of Samuel Messick 1931–1998

Everyone who is anyone in the field of testing actually has heard of Samuel Messick.  The American Psychological Association has instituted a prestigious annual scientific award in his name, honouring his important work in the theory of test validity.   I want to devote this, my first-ever blog post, to one of his seminal insights about testing.   It’s arguable that his insight is critical for the future effectiveness of American education.

My logic goes this way:   Every knowledgeable teacher and policy maker knows that tests, not standards, have the greater influence on what principals and teachers do in the classroom.   My colleagues in Massachusetts—the state that has the most effective tests and standards—assure me that it’s the demanding, content-rich MCAS tests that determine what is taught in the schools.  How could it be otherwise?  The tests determine whether a student graduates or whether a school gets a high ranking.  The standards do vaguely guide the contents of the tests, but the tests are the de facto standards.

It has been and will continue to be a lively blog topic to argue the pros and cons of the new Common Core State Standards in English Language Arts.    But so far these arguments are more theological than empirical, since any number of future curricula—some good, some less so—can fulfill the requirements of the standards.   I’m sure the debates over these not-yet-existent curricula will continue; so it won’t be spoiling anyone’s fun, if I observe that these heated debates bear a resemblance to what was called in the Middle Ages the Odium Theologicum over unseen and unknown entities.   Ultimately these arguments will need to get tied down to tests.   Tests will decide the actual educational effects of the Common Core Standards.

But Samuel Messick has enunciated some key principles that will need to be heeded by everyone involved in them if our schools are to improve in quality and equity—not only in the forty-plus states that have agreed to use the common core standards—but also in those states that have not.   In all fifty states, tests will continue to determine classroom practice and hence the future effectiveness of American education.

In this post, I’ll sketch out one of Messick’s insights about test validity.   In a second post, I’ll show how ignoring those insights has had deleterious effects in the era of NCLB.  And in a third, and last on this topic, I’ll suggest policy principles to avoid ignoring the scientific acumen and practical wisdom of Samuel Messick in the era of the Common Core Standards.

 ******

Messick’s most distinctive observation shook up the testing world, and still does.  He said that it was not a sufficient validation of a test to show that it exhibits “construct validity.”    This term of art means that the test really does accurately estimate what it claims to estimate.   No, said Messick, that is a purely technical criterion.   Accurate estimates are not the only or chief function of tests in a society,   In fact, accurate estimates can have unintended negative effects.   In the world of work they can unfairly exclude people from jobs that they are well suited to perform.  In the schools “valid” tests may actually cause a decline in the achievement being tested – a paradoxical outcome that I will stress in the three blogs devoted to Messick.

Messick called this real-world attribute of tests “consequential validity.”    He proposed that test validity be conceived as a unitary quality comprising both construct validity and consequential validity—both the technical and the ethical-social dimension.   What shall it profit a test if it reaches an accurate conclusion yet injures the social goal it was trying to serve?

Many years ago I experienced the force of Messick’s observation before I knew that he was the source of it.    It was in the early 1980s, and I had published a book on the valid testing of student writing. (The Philosophy of Compsition).   At the time, Messick was the chief scientist at the Educational Testing Service, and under him a definitive study had been conducted to determine the most valid way to measure a person’s writing ability.   Actual scoring of writing samples was notoriously inconsistent, and hence unfair.   Even when graded by specially socialized groups of readers (the current system) there was a good deal of variance in the scoring.

ETS devised a test that probed writing ability less directly and far more reliably.   It consisted of a few multiple-choice items concerned with general vocabulary and editorial acumen.     This test proved to be not only far shorter and cheaper, it was also more reliable and valid.    That is, it better predicted elaborately determined expert judgment of writing ability than did the writing samples.

There was just one trouble with this newly devised test.  Used over time, student writing ability began to decline.   The most plausible explanation was that although the test had construct validity it lacked consequential validity.   It accurately predicted writing skill, but it encouraged classroom activity which diminished writing skill—a perfect illustration of Messick’s insight.

Under his intellectual influence there is now, again, an actual writing sample to be found on the verbal SAT.   The purely indirect test which dispensed with that writing sample had had the unfortunate consequence of reducing the amount of student writing assigned in the schools, and hence reducing the writing abilities of students.  A shame: the earlier test was not just more accurately predictive as an estimate, it was fairer, shorter, and cheaper.  But ETS has made the right decision to value consequential validity above accuracy and elegance.

Next time: Consequential Validity and the Era of No Child Left Behind

The PIRLS Reading Result–Better than You May Realize

by Dan Willingham
December 17th, 2012

This was written by cognitive scientist Daniel Willingham, professor of psychology at the University of Virginia and author of  “When Can You Trust The Experts? How to tell good science from bad in education.” This appeared on his Science and Education blog.

The PIRLS results are better than you may realize.

Last week, the results of the 2011 Progress in International Reading Literacy Study (PIRLS) were published. This test compared reading ability in 4th grade children.

U.S. fourth-graders ranked 6th among 45 participating countries. Even better, US kids scored significantly better than the last time the test was administered in 2006.

There’s a small but decisive factor that is often forgotten in these discussions: differences in orthography across languages.

Lots of factors go into learning to read. The most obvious is learning to decode–learning the relationship between letters and (in most languages) sounds. Decode is an apt term. The correspondence of letters and sound is a code that must be cracked.

In some languages the correspondence is relatively straightforward, meaning that a given letter or combination of letters reliably corresponds to a given sound. Such languages are said to have a shallow orthography. Examples include Finnish, Italian, and Spanish.

In other languages, the correspondence is less consistent. English is one such language. Consider the letter sequence “ough.” How should that be pronounced? It depends on whether it’s part of the word “cough,” “through,” “although,” or “plough.” In these languages, there are more multi-letter sound units, more context-dependent rules and more out and out quirks.

Another factor is syllabic structure. Syllables in languages with simple structures typically (or exclusively) have the form CV (i.e., a consonant, then a vowel as in “ba”) or VC (as in “ab.”) Slightly more complex forms include CVC (“bat”) and CCV (“pla”). As the number of permissible combinations of vowels and consonants that may form a single syllable increases, so does the complexity. In English, it’s not uncommon to see forms like CCCVCC (.e.g., “splint.”)

Here’s a figure (Seymour et al., 2003) showing the relative orthographic depth of 13 languages, as well as the complexity of their syllabic structure.

From Seymour, et. al. (2003)

Orthographic depth correlates with incidence of dyslexia (e.g., Wolf et al, 1994) and with word and nonword reading in typically developing children (Seymour et al. 2003). Syllabic complexity correlates with word decoding (Seymour et al, 2003).

This highlights two points, in my mind.

First, when people trumpet the fact that Finland doesn’t begin reading instruction until age 7 we should bear in mind that the task confronting Finnish children is easier than that confronting English-speaking children. The late start might be just fine for Finnish children; it’s not obvious it would work well for English-speakers.

Of course, a shallow orthography doesn’t guarantee excellent reading performance, at least as measured by the PIRLS. Children in Greece, Italy, and Spain had mediocre scores, on average. Good instruction is obviously still important.

But good instruction is more difficult in languages with deep orthography, and that’s the second point. The conclusion from the PIRLS should not just be “Early elementary teachers in the US are doing a good job with reading.” It should be “Early elementary teachers in the US are doing a good job with reading despite teaching reading in a language that is difficult to learn.”

References

Seymour, P. H. K., Aro, M., & Erskine, J. M. (2003). Foundation literacy acquisition in European orthographies. British Journal of Psychology, 94, 143-174.

Wolf, M., Pfeil, C., Lotz, R., & Biddle, K. (1994). Towarsd a more universal understanding of the developmental dyslexias: The contribution of orthographic factors. In Berninger, V. W. (Ed), The varieties of orthographic knowledge, 1: Theoretical and developmental issues.Neuropsychology and cognition, Vol. 8., (pp. 137-171). New York, NY, US: Kluwer

Words Get in the Way

by Robert Pondiscio
November 30th, 2012

This blog has long kvetched about the tendency to use terms like standards (what proficiencies kids should be able to demonstrate) and curriculum (the material that gets taught in class) interchangably.  Michael Goldstein, founder of Boston’s MATCH school observes that education lacks a common vocabulary, which makes life harder for teachers.  “They get bombarded all the time with new products, websites, software that all claim they can get students to ‘deeper learning.’ But without a common understanding of what actually qualifies, it’s hard to know if X even purports to get your kids where you want them to go,” he writes.

Goldstein compares education to medicine where there is broad agreement, for example, on the five stages of cancer–and that makes it easier for for medical professionals and patients to work together.  “When scientists come up with treatments,” he notes, “they often find them to be effective for cancers only in certain stages. So when they tell doctors: ‘treatment only effective for X cancer in stage two,’ everybody knows what that means.”

In education, no such common vocabulary exists.

“Our sector talks a lot of “Deeper Learning.” Or “Higher-Order Skills.”

“But what does that mean? There’s not a commonly-accepted terminology or taxonomy. Instead, there are tons of competing terms and ladders.

“In math, for example, here’s language that the US Gov’t uses for the NAEP test. Low, middle, and high complexity. I suppose they might characterize the “high” as “deeper learning.”

“Here’s Costa’s approach, a different 3 levels. Text explicit, text implicit, and activate prior knowledge. Again, perhaps the last is “deeper learning.”

“Here’s another take, more general than math-specific, from Hewlett.

“A software like MathScore has its own complexity ratings.

“And so on. You could find 10 more in 10 minutes of Googling.

Goldstein posts a question from Massachusetts’ MCAS tests, a perimeter question that shows four different rectangles and asks, “Which of these has a perimeter of 12 feet?”

“First you need to know what perimeter means. Second you need to know you that you need to fill in the “missing sides.” Third you need to know what to fill in, because you understand “rectangle.” Finally you need to add those 4 numbers. If you only understand 3 of the 4 ideas, you’ll get the question wrong.

“Does this question probe “deeper learning” for a 3rd grader? Who the heck knows?

If this strikes you as mere semantics, think again.  A lack of an agreed vocabulary — what is a “basic skill?”  What is “higher order thinking?” — is not merely irritating, it can lead to bad practice and misplaced priorities.   A third-grade teacher looking to remediate a lack of basic skills might seek help from a software product but she would have “no real idea on how ‘deep’ they go, or how ‘shallow’ they start,” Goldstein notes.  “No common language for ‘Depth’ or ‘Complexity.’”

I would add that the problem is more fundamental than that.  If a teacher is told “teach higher-order thinking” she might incorrectly assume that time spent on basic knowledge, math skills or fluency is a waste of time.  Or, in the worst case scenario, that reading comprehension or higher order thinking can be directly taught.  

In reality, without the basic skills and knowledge firmly in place, there’s no such thing as higher order anything and never will be.  Yet terms like “higher order thinking” and “complexity” are held up as the gold standard we should be teaching toward.  Basic knowledge and prerequisite skills are the unlovely companions of “drill and kill” rather than, say, ”fluency” or “automaticity.” Mischief and miplaced priorities are the inevitable result.

A common vocabulary of diagnosis and treatment would help. 

 

 

 

 

 

Second Thoughts on Pineapplegate

by Robert Pondiscio
May 4th, 2012

Writing in his TIME Magazine column, Andy “Eduwonk” Rotherham offers up a largely exculpatory take on Pineapplegate.  The media jumped all over a bowdlerized version of the test passage, he notes.  New York state officials should have been clearer in explaining that nothing makes its way onto standardized tests by accident.  And in the end, Andy writes, what is needed is “a more substantive conversation rather than a firestorm” over testing.

Very well, let’s have one.

In the unlikely event you haven’t heard, a minor media frenzy was ignited a few weeks back when the New York Daily News got hold of a surreal fable, loosely modeled on the familiar tale of the Tortoise and the Hare, which appeared on the just-administered New York State 8th grade reading test.  In the test passage, a talking pineapple challenges a hare to a foot race in front of a group of woodland creatures, loses the race (the pineapple’s lack of legs proving to be a fatal competitive disadvantage)  and gets eaten by the other animals.

Rotherham points out that the passage picked up by the paper was not the actual test passage, but a second-hand version plucked from an anti-testing website. “The passage the paper ran was so poorly written that it would indeed have been inexcusable,” he wrote.  Perhaps, but the correct passage wasn’t exactly a model of clarity and coherence either.  Indeed, the fable’s author mocked the decision by the testing company, Pearson, to create multiple choice questions about his story on a state test.  “As far as I am able to ascertain from my own work, there isn’t necessarily a specifically assigned meaning in anything,” Daniel Pinkwater told the Wall Street Journal. “That really is why it’s hilarious on the face of it that anybody creating a test would use a passage of mine, because I’m an advocate of nonsense. I believe that things mean things but they don’t have assigned meanings.”

Ultimately the real version of the test passage was released by the state to quiet the controversy.  But it did little to reverse the impression that this was a questionable measure of students’ ability.  Rotherham’s big “get” in Time is a memo from Pearson to New York State officials detailing the question’s review process as well as its use on other states’ tests as far back as 2004.  The message:  nothing to see here, folks.  Show’s over.  Go on back to your schools, sharpen those No. 2 pencils and get ready for more tests.

“Standardized tests are neither as bad as their critics make them out to be nor as good as they should be,” Rotherham concludes.  Perhaps, but they’re bad enough.  The principal problem, which Pineapplegate underscores vividly, is that we continue to insist on drawing conclusions about students’ reading ability based on a random, incoherent collection of largely meaningless passages concocted by test-makers utterly disconnected from what kids actually learn in school all day.  This actively incentivizes a form of educational malpractice, since reading tests reinforce the mistaken notion that reading comprehension is a transferable skill and that the subject matter is disconnected from comprehension.   But we know this is not the case as E.D. Hirsch and Dan Willingham have pointed out time and again, and as we have discussed on this blog repeatedly.

So this is not a simple case of an uproar based on bad information and sloppy damage control.  What Rotherham misses in a somewhat strident defense of standardized tests and testing is that we are suffering generally from a case of test fatigue. The entire edifice of reform rests on testing, and while the principle of accountability remains sound, the effects of testing on schools has proven to be deleterious, to be charitable. Thus the conditions were ripe for people to overreact to perceived absurdity in the tests. And that’s exactly what happened here.

Was the story was blown out of proportion by some people playing fast and loose with the facts?  Perhaps.  But the facts, once they became clear, were more than bad enough.

Did You Hear the One About the Talking Pineapple…

by Robert Pondiscio
April 20th, 2012

“It’s clearly an allegory. The pineapple is the Department of Education. The hare is the student who is eagerly taking the test,” said E.D. Hirsch. “The joke is supposed to be on the hare, because the questions are post-modern unanswerable,” he said. “But in fact the joke is on the pineapple, because the New York Daily News is going to eat it up.”

I’d explain what he’s talking about, but some things are beyond explanation….

Update:  At EdWeek Teacher, Anthony Cody asks the question that needs to be asked:  Would YOU want to be judged based on an 8th grader’s ability to make sense of this bizarre little story?