Mere Facts, Mere Knowledge, Mere College Readiness

by E. D. Hirsch, Jr.
February 25th, 2013

Is teaching many domains in English language arts more important to college and career readiness than teaching many words?

Research on teaching vocabulary has determined better and worse ways of conducting explicit instruction.  Word lists and isolated definitions, while they may seem efficient, are among the least effective methods, while explicit explanations of words in context are the most effective. Ideally, according to one distinguished researcher, students can learn up to 400 new words in a school year by explicit methods (2+ words a day for 180 days under ideal circumstances). Others offer a more modest estimate, around 200 words per school year.

Yet the minimal count of words you need to be college and career ready is estimated to be 12,000 to 30,000, depending on the mode of counting. The explicit method of instruction at its best yields 5,200 words between kindergarten and 12th grade. Yet even marginal high school students need to know twice that many—meaning that most of their word learning must occur incidentally in the course of understanding the gist of spoken and written language.

Nonetheless, I would agree with advocates of explicit word study that, done strategically as an integrated and not-very-time-consuming part of a lesson, explicit instruction can help unlock enough of the gist of a passage to speed up the incidental learning of words. But then the question arises: what sort of words should we pause over in order to make the best use of class time and help the student make the fastest progress?

Experts in explicit word study have identified three main categories of words called Tier 1, Tier 2, and Tier 3, in order of frequency of occurrence in written English. The current expert view is that teachers should focus on Tier 2 words. Tier 1 words are so usual that students are likely to learn them on their own. Tier 3 words, on the other hand, are so rare that focusing on them does not offer much advancement for general reading ability. So under current thinking, the following sorts of Tier 2 words are the ones teachers should spend most class time on:  reputation, disruption, hovers, stifling, obstacle, descendants, maximum, standards, barren, desolate—words that are moderately frequent, because used in multiple written contexts. That’s not true of domain-specific Tier 3 words like, valence, bildungsroman, Renaissance, metabolism, Gettysburg, photosynthesis,  stochastic, ionic, simile, dew point, polygon, Madison, monotheism, kinetic, Dalton, Fourier, Magna Carta, Impressionism, helium, fiscal, TR, and Shiite. 

But I’m not persuaded by this rationale. Although Tier 2 words are to be found in multiple contexts, they do not constitute a big percentage of the totality of different words in the English vocabulary. That distinction belongs to the words of Tier 3, which are domain specific. If you want to reach the magic number of 25,000 thousand or so, it’s best to spend your time learning domain-specific Tier 3 words. After all, there’s a bit of inconsistency in the expert advice to teachers to spend most of students’ explicit-word-study time on Tier 2 words after having said that Tier 1 words can be ignored on the grounds that they are used so frequently that most people have learned them incidentally. That sensible principle recedes when it comes to their doctrine about Tier 2 words, which we are advised to focus on precisely because they are relatively frequent and are used in multiple written contexts. Some serious research needs to be undertaken to determine whether, in a good, coherent knowledge-based curriculum most Tier 2 words aren’t also learned incidentally as a matter of course, just like most Tier 1 words, as the overall math would suggest. (This research has not been conducted, despite the confident advice about studying domain-general Tier 2 words. Indeed there is some counter evidence in the studies by John Guthrie indicating the superiority of domain-specific instruction in ELA.)

To support the emphasis on Tier 2 words many educators assume that there exists such a thing as general “reading skill,” which will be the key to college and career readiness. But cognitive scientists instruct us that it’s an oversimplification to suppose that there is such a thing as a domain-general reading skill that can be fostered by the explicit study of domain-general, Tier 2 words. On the contrary, the latest cognitive science tells us that reading skills, like most skills, are “domain specific.”  Granted, there are important domain-general aspects of reading that include automatic, unconscious procedures like decoding skill, eye movements, strategic meaning searches, and knowledge of domain-general words. It is reasonable, indeed essential, to ensure that students gain such domain-general knowledge. But few experts advise that students be explicitly trained in eye-movement patterns, at least not very extensively. For most students that skill develops unconsciously without continuous instruction.  The same is true of most domain-general word learning—which occurs unconsciously, bit by bit, through multiple exposures to a word in different contexts. Domain general skills like decoding, once mastered, are continually practiced and unconsciously improved precisely because, being domain general, they occur frequently.

There’s a clear analogy with skill in sports. Most sports demand domain-general athletic abilities like hand-eye coordination.  Nonetheless being skilled specifically in golf does not directly transfer to being skilled in tennis or even in croquet. Each sport has domain-specific skills that must be explicitly mastered. Similarly, being skilled in reading about golf does not readily transfer to being skilled in reading about tennis. The golf passages will of course contain domain-general words like but, however, pretty, and willing, but the critical words will be birdie, bogie, and par, and knowing them won’t help you read a tennis story with set point, fault, and ace.

Why do you suppose school reading tests typically offer ten or so passages? If reading were a domain-general skill, one passage would suffice. (If I want to know if you can ride a bike, I won’t bring ten bikes for you to ride. One will suffice.) But reading tests always contain several passages because a reliable reading test has to sample your ability to read in several different domains. Reading tests are essentially tests of how many different domains you have knowledge of and vocabulary for. To be a literate adult—one who could read a newspaper front to back—you must have knowledge in a very broad range of domains.

If we wish our students to perform well on a reading test, we ought to abandon the disparagement of “mere facts.”  Nothing contributes more to a student’s reading abilities than wide knowledge of multiple domains, automatically accompanied by knowledge of many domain-specific, Tier 3 words. In sum, nothing contributes more to college and career readiness than broad general knowledge over multiple domains.

The best way to teach “English language arts” then is systematically to teach substantive domains of knowledge along with their inherently related vocabularies. In fact the whole issue needs to be broadened by a return to real classes in history, science, and the arts in elementary grades, as the best way to gain proficiency in reading. This larger principle transcends the currently debated topic of fictional vs. non-fictional genres. Much good fiction is a repository of domain knowledge—not just of human nature and ethical principles, but also of historical and factual knowledge, including such things as Mississippi river-boating in Huckleberry Finn, and whaling in Moby Dick, as well as the forms and techniques of literature, like simile and metaphor, prefixes and suffixes, which are just as “informational” as chemical valences. What is needed for college and career readiness is extensive general knowledge over multiple domains, coherently delivered—with lots of Tier 3 words.

When this is done well, with gradually increasing sophistication grade by grade, Tiers 1 and 2 will mostly take care of themselves.

The Skills Stranglehold

by E. D. Hirsch, Jr.
February 21st, 2013

It’s not like it wasn’t obvious already, but today’s Metlife Survey of the American Teacher confirms that the nation’s teachers are demoralized. How could it be otherwise, with pressure to build the Common Core plane while flying it and also facing new evaluation and accountability requirements?

I don’t want to brush off any of these very real problems, but I do want to suggest that they are not the heart of the matter. Fundamentally, the problem educators face is freeing themselves from the skills stranglehold. It is preventing them from understanding the Common Core standards, preventing them from meeting their own goals as professionals, and preventing them from closing achievement gaps between poor and privileged students.

We see evidence of it everywhere, especially in the MetLife survey. Nine in ten teachers and principals say they are knowledgeable about the Common Core standards, and a majority of teachers say they are already using them a great deal. At the same time, teachers, especially in later grades, are not all that confident about the effect the Common Core will have. The report states (p. 65):

Middle school and high school principals and teachers are less likely than their elementary school counterparts to be very confident or confident that the Common Core will improve student achievement (principals: 73% vs. 85%; teachers: 61% vs. 76%). Middle school and high school teachers are less likely than elementary school teachers to be very confident or confident that the Common Core will better prepare students for college and the workforce (63% vs. 78%); principals’ views on this do not differ significantly by school level.

At all levels, just “two in 10 principals or teachers indicate that they are very confident that the Common Core will have these effects.” How can this be? Teachers could be feeling too downtrodden to have great confidence in anything, but I think the real answer is hidden in the report itself. There’s a hint in the report’s “From the Experts” box (p. 58):

The public education thought leaders interviewed as part of the survey development process … are concerned that some teachers and principals may be underestimating how large a shift in curriculum, teaching, and assessment may be required to implement the new standards fully.

  • “In all but a handful of states around the country, there are new academic standards that are being implemented that will demand very fundamental changes in teaching and learning; very fundamental changes in the instructional practices that teachers use in the classroom. Teachers say they’re aware of the standards and they like the standards; they’re not much different than what they’re doing now, which is generally not the case.”
  • “The rigor is simply much harder or much more demanding than most states have had in the past, so dealing with the real benchmark of where you are as a teacher and your performance and your mastery of these standards and how well your students are going to do is kind of a… I don’t know whether the word is culture shock, when you start seeing the true benchmark as opposed to where you thought you were.”

The fact that so many teachers (62%) say the teachers in their school are already using the Common Core standards a great deal shows that these “thought leaders” are correct: most educators remain unaware of the massive changes that fully implementing the new standards will require. But everyone has been talking about these changes for more than a year. Clearly, the message is not getting through.

It can’t get through: The barrier erected by the skills stranglehold is far stronger than anyone realizes. Consider this, from the very beginning of the report’s section on the Common Core (p. 53):

Middle and high school teachers indicate that the critical components of being college- and career-ready focus more on higher-order thinking and performance skills—such as problem-solving skills, critical-thinking skills and the ability to write clearly and persuasively—than on knowledge of challenging content.

Here we see the skills stranglehold in its purest form. Skills can’t be more important than knowledge for college and career because without knowledge, there are no “higher-order thinking and performance skills.” Skills depend on knowledge. If I don’t know any physics, I can’t think critically about physics. And, the more I know about physics, the more successful I will be in solving physics problems.

Lest you think I’m making too much of this one sentence about middle and high school teachers, let me take you back to the 2010 MetLife survey. On page 21, you’ll see this:

And on page 22, you’ll see this handy summary:

Teachers share remarkably similar views on the importance of these skills, abilities and knowledge areas regardless of grade level taught, years of experience, school characteristics or even subject area. English teachers are most likely to say the ability to write clearly and persuasively is absolutely essential or very important (99%), and 92% of math teachers also rate this ability as highly. While less than half (45%) of English teachers say that knowledge and ability in higher-level mathematics, such as trigonometry and calculus is absolutely essential or very important, math teachers themselves do not rate the necessity of higher-level mathematics much more highly (50%).

I’ll let the executives off the hook for not knowing that the problem-solving and critical-thinking skills they are after depend on the knowledge that they (largely) dismiss. The teachers ought to know better. That just 11% think knowledge of higher-level science and math are essential for college and career readiness is appalling.

But I can’t really blame them. Teachers have themselves been taught that skills are transferrable, independent of particular knowledge or mere facts. The skills stranglehold has been tightening its grip for nearly 100 years. Recently, educators’ focus on skills—particularly so-called 21st century skills—and disparagement of knowledge got so bad that the National Research Council took up the issue, clarifying that skills and knowledge can’t be separated, and then exploring how deepening content knowledge could lead to better skills:

In contrast to a view of 21st century skills as general skills that can be applied to a range of different tasks in various academic, civic, workplace, or family contexts, the committee views 21st century skills as dimensions of expertise that are specific to—and intertwined with—knowledge within a particular domain of content and performance. (p. 3)

Over a century of research on transfer has yielded little evidence that teaching can develop general cognitive competencies that are transferable to any new discipline, problem, or context, in or out of school. Nevertheless, it has identified features of instruction that are likely to substantially support deeper learning and development of 21st century competencies within a topic area or discipline. For example, we now know that transfer [within a discipline] is supported when learners understand the general principles underlying their original learning and the transfer situation or problem involves the same general principles—a finding reflected in the new Common Core State Standards…. (p. 8)

The necessary merger of deep content knowledge and higher-order skills is indeed reflected in the Common Core standards. But sadly, we have a long way to go for it to be reflected in most of our classrooms.

Stumping STEM Growth

by Linda Bevilacqua
February 8th, 2013

Like many others, I’ve had high hopes for the Next Generation Science Standards. Right now I’m struggling to keep my spirits up. Having just finished reading the review of the second draft (NGSS 2.0) prepared for the Fordham Institute by nine impressive scientists and mathematicians (who, collectively, have teaching experience at all grade levels), I see more problems than can be fixed between now and March—the arbitrary deadline set for releasing the final draft of these standards.

For a quick take on the many serious problems, see the review’s Forward by Chester E. Finn, Jr. and Kathleen Porter-Magee. Or, for an even faster look at the main issues, see Finn and Porter-Magee’s recent blog post. In both, they raise eight “critical problems.” While I agree that all eight are truly critical, I’d like to draw attention to three (the following are quotes from the blog post):

  • In an effort to draft “fewer and clearer” standards to guide curriculum and instruction, NGSS 2.0 (like NGSS 1.0) omits quite a lot of essential content. Among the most egregious omissions are most of chemistry; thermodynamics; electrical circuits; physiology; minerals and rocks; the layered Earth; the essentials of biological chemistry and biochemical genetics; and at least the descriptive elements of developmental biology.
  • As in version 1.0, some content that is never explicitly stated for the earlier grades seems to be taken for granted in the standards for later grades—where it won’t likely be found in students’ heads if the early-grade teachers aren’t prompted by the standards to teach it.
  • A number of key scientific terms (e.g., “model” and “design”) are ill defined and/or inconsistently used.

As E. D. Hirsch, Jr., and the Core Knowledge Foundation have been arguing for the past three decades, students have to build an enormous store of broad background knowledge and vocabulary in order to become literate adults—adults capable of reading about and voting on science-based issues like nuclear power, genetic research, land use, etc. The amount of knowledge to be acquired is so extensive that it must be efficiently and coherently packaged, grade-by-grade, if we are to have any hope of sending young adults into the world ready to make sense of, and dive deeper into, the many issues they will face.

As worrisome as Finn and Porter-Magee’s summative statements are, the review itself may give me nightmares. Take, for example, these quotes from pages 17 – 19:

Using the assertion that it is not a curriculum, the NGSS authors omit most of the chemistry content traditionally found in K–12 classrooms. Missing are topics like gas-law relationships, the chemistry of carbon and its compounds, the mole concept, empirical and molecular formulas, solution preparation, concentration, and dilution, and acid/base neutralization reactions and the pH scale, to mention just a few. When topics are included, they often are somewhat advanced, like bond energy or chemical equilibrium. However, their inclusion is problematic because of insufficient background preparation in lower grade standards, use of low-level vocabulary, or content limits specified in the Assessment Boundaries. And unfortunately, if a topic is not required by the NGSS, it is not likely to be taught.

Numerous concepts that will be developed more thoroughly in high school should first be introduced in middle school. “Ion,” for example, is used in HS PS1-c without explanation, but the testing of “polyatomic ions” was excluded. Then why is the polyatomic “ammonium” ion used in “ammonium chloride” as a recommended reactant in MS PS1-g?

Another example of weak preparation from page 1 of DCI PS.4.B:

Some materials allow light to pass through them, others allow only some light through and others block all the light and create a dark shadow on any surface beyond them (i.e., on the other side from the light source), where the light cannot reach. (1-PS4-d)

Here is a typical missed opportunity to use the appropriate vocabulary: transparent, translucent, opaque.

And here are a couple examples from Appendix A of the review, which covers individual standards (see page 45):

PS3.C: Faster speeds during a collision can cause a bigger change in shape of the colliding objects. (secondary to 2-PS2-a)

“Faster speeds” … is a barbarism. When an object goes faster, we say that it has a higher speed…. In science standards, using scientifically appropriate language is critical.

Similarly, standard (3-PS2-a) indicates: “A system can appear to be unchanging when processes within the system are going on at opposite but equal rates.”

Why not use the proper technical terms, dynamical equilibrium or steady-state equilibrium?

The second draft of the NGSS was anything but slim. Why have so much content and vocabulary been left out? It appears to have been crowded out by a fixation on “practices.” Here’s how Finn and Porter-Magee summed up this critical problem in their blog post: “Real science invariably blends content knowledge with core ideas, ‘crosscutting’ concepts, and various practices, activities, or applications. Well and good. But NGSS 2.0 imposes so rigid a format on its standards that the recommended ‘practices’ dominate them. The authors have forced practices on every expectation, even when they confuse more than clarify.” Here is an example from the review (see page 20):

In the life sciences, … and as elsewhere in NGSS, the central problem resides in the language employed, and it follows from the standards’ preoccupation with “Practices”…. Every standard to focus upon performance expectations that are behaviors (or activities) as opposed to demonstrations of knowledge. Behaviors and activities are legitimate performance expectations; but when all the expectations take that form, a system of standards, which is in principle about knowledge as well as skills, becomes ostentatiously one-sided. The resulting standards statements may not relate in a compelling way to the knowledge that is supposed to be the directing content dimension.

Knowledge, vocabulary, and skills are all necessary, but this draft of the NGSS emphasizes skills to the detriment of knowledge and vocabulary. Ultimately, this constant pushing on “practices” seems to be an effort to force teachers to take an extremely hands-on, project-focused approach to science instruction. While no one would believe that a science classroom without labs, experiments, observations, etc. is offering a strong science education, no one should believe that a science classroom in which activities crowd out content is strong either.

Heeding two of the review’s recommendations (see page 33) would allow for knowledge, vocabulary, and skills to all be pursued together, without any one detracting from the others:

Ban the use of the term “model,” except in familiar scientific contexts such as molecular models or Copernican model or computer modeling (better identified as simulation).

Reduce the insistent “Practices” language in the standards. Science practices certainly need to be taught and learned, but there is no justification for converting all expected science performances to “practices,” and making their substrate, scientific knowledge (including substantive, mathematical, analytical, and vocabulary knowledge) secondary.

One of the great strengths of the Common Core State Standards is that they are goal statements as to what students need to know and be able to do, not dictates as to how teachers should teach. The NGSS should follow that lead by focusing on the science content and vocabulary, and integrating related skills as needed. In effect, this would require stripping away the “practices” language that has more to do with current fads in pedagogy than with developing students’ ability to comprehend science and/or become scientists or engineers.

In their Forward, Finn and Porter-Magee concluded that “if draft 2.0 were to become the final version of NGSS, only states with exceptionally weak science standards of their own would likely benefit from replacing them with these ‘next-generation’ standards.” I hope that the organizations developing with NGSS will drop their March deadline and heed the many cautions raised so that, like the Common Core State Standards, the NGSS can be strongly recommended to all states.

Blame the Tests

by E. D. Hirsch, Jr.
January 15th, 2013

In Praise of Samuel Messick 1931–1998, Part III

The chief practical impact of NCLB has been its principle of accountability. Adequate yearly progress, the law stated, must be determined by test scores in reading and math—not just for the school as a whole, but for key groups of students.

Now, a decade later, the result of the law, as many have complained, has been a narrowing of the school curriculum. In far too many schools,  the arts and humanities, and even science and civics, have been neglected—sacrificed on the altar of tests  without any substantial progress nationwide on the tests themselves. It is hard to decide whether to call NCLB a disaster or a catastrophe.

But I disagree with those who blame this failure on the accountability principle of NCLB. The law did not specify what tests in reading and math the schools were to use. If the states had responded with valid tests—defined by Messick as tests that are both accurate and have a productive effect on practice—the past decade would have seen much more progress.

Since NCLB, NAEP’s long-term trend assessment shows substantial increases in reading among the lowest-performing 9-year-olds—but nothing comparable in later grades. It also shows moderate increases in math among 9- and 13-year-olds.

So, it seems that a chief educational defect of the NCLB era lay in the later-grades reading tests; they simply do not have the same educational validity of the tests in early grades reading and in early- and middle-grades math.

 ****

It’s not very hard to make a verbal test that predicts how well a person will be able to read. One accurate method used by the military is the two-part verbal section of the multiple-choice Armed Forces Qualification Test (AFQT), which is known for its success in accurately predicting real-world competence. One section of the AFQT Verbal consists of 15 items based on short paragraphs on different subjects and in different styles to be completed in 13 minutes.  The other section of the AFQT Verbal is a vocabulary test with 35 items to be completed in 11 minutes. This 24-minute test predicts as well as any verbal test the range of your verbal abilities, your probable job competence and your future income level. It is a short, cheap and technically valid test. Some version of it could even serve as a school-leaving test.

Educators would certainly protest if that were done—if only because such a test would give very little guidance for classroom practice or curriculum. And this is the nub of the defects in the reading tests used during the era of NCLB: They did not adequately support curriculum and classroom practice. The tests in early-grades reading and in early- and middle-grades math did a better job of inducing productive classroom practice, and their results show it.

Early-grades reading tests, as Joseph Torgesen and his colleagues showed, probe chiefly phonics and fluency, not comprehension. Schools are now aware that students will be tested on phonics and fluency in early grades. In fact, these crucial early reading skills are among the few topics for which recent (pre-Common Core) state standards had begun to be highly specific. These more successful early reading tests were thus different from later ones in a critical respect:  They actually tested what students were supposed to be taught.

Hence in early reading, to its credit, NCLB induced a much greater correlation than before between standards, curriculum, teaching and tests. The tests became more valid in practice because they induced teachers to teach to a test based on a highly specific subject matter—phonics and fluency. Educators and policymakers recognized that teaching swift decoding was essential in the early grades, tests assessed swift decoding, and—mirabile dictu—there was an uptick in scores on those tests.

Since the improvements were impressive, let’s take a look at what has happened in over the past decade among the lowest performing 9-year-olds on NAEP’s long-term trend assessment in reading.

Note that there is little to no growth among higher-performing 9-year-olds, presumably because they had already mastered phonics and fluency.

Similarly, early- and middle-grades math tests probed substantive grade-by-grade math knowledge, as the state standards had become ever more specific in math. You can see where I’m going: Early reading and math improved because teachers typically teach to the tests (especially under NCLB-type accountability pressures), and the subject matter of these tests began to be more and more defined and predictable, causing a collaboration and reinforcement between tests and classroom practice.

In later-grades reading tests, where we have failed to improve, the tests have not been based on any clear, specific subject matter, so it has been impossible to teach to the tests in a productive way. (The lack of alignment between math course taking and the NAEP math assessment for 17-year-olds is similarly problematic.) Of course, there are many reasons why achievement might not rise. But specific subject matter, both taught and tested, is a necessary—if not sufficient—condition for test scores to rise.

In the absence of any specific subject matter for language arts, teachers, textbook makers, and test makers have conceived of reading comprehension as a strategy rather than as a side effect of broad knowledge. This inadequate strategy approach to language arts is reflected in the tests themselves. I have read many of them.  An inevitable question is something like this: “The main idea of this passage is….” And the theory behind such a question is that what is being tested is the ability of the student to strategize the meaning by “questioning the author” and performing other puzzle-solving techniques to get the right answer. But, as readers of this blog know, that is not what is being tested. The subject matter of the passage is.

This mistaken strategy-focused structure has made these tests not only valueless educationally, but worse—positively harmful. Such tests send out the misleading message that reading comprehension is chiefly strategizing. That idea has dominated language arts instruction in the past decade, which means that a great deal of time has been misspent on fruitless test-taking activities. Tragically, that time could have been spent on science, humanities and the arts—subjects that would have actually increased reading abilities (and been far more interesting).

The only way that later-grades reading tests can be made educationally valid is by adopting the more successful structure followed in early reading and math. An educationally valid test must be based on the specific substance that is taught at the grade level being tested (possibly with some sampling of specifics from previous and later grades for remediation and acceleration purposes). Testing what has been taught is the only way to foster collaboration and reinforcement between tests and classroom practice. An educationally valid reading test requires a specific curriculum—a subject of further conversations, no doubt.

The Work of a Great Test Scientist Helps Explain the Failure of No Child Left Behind

by E. D. Hirsch, Jr.
January 10th, 2013

In Praise of Samuel Messick 1931–1998, Part II

In a prior post I described Messick’s unified theory of test validity, which judged a test not to be valid if its practical effects were null or deleterious. His epoch-making insight was that the validity of a test must be judged both internally for accuracy and externally for ethical and social effects. That combined judgment, he argued, is the only proper and adequate way of grading a test.

In the era of the No Child Left Behind law (2001), the looming specter of tests has been the chief determiner of classroom practice. This led me to the following chain of inferences: Since 2001, tests have been the chief determiners of educational practices. But these tests have failed to induce practices that have worked. Hence, according to the Messick principle, the tests that we have been using must not be valid. Might it be that a new, more Messick-infused approach to testing would yield far better results?

First, some details about the failure of NCLB. Despite its name and admirable impulses it has continued to leave many children behind:

 

NCLB has also failed to raise verbal scores. The average verbal level of school leavers stood at 288 when the law went into effect, dropped to 283 in 2004, and stood at 286 in 2008.

Yet this graph shows an interesting exception to this pattern of failure, and it will prove to be highly informative under Messick’s principle. Among 4th graders (age 9) the test-regimen of NCLB did have a positive impact.

Moreover, NCLB also had positive effects in math:

This contrast between the NCLB effects in math and reading is even more striking if we look at the SAT, where the test takers are trying their best:

So let’s recap the argument. Under NCLB, testing in both math and reading has guided school practices. Those practices were more successful in math and in early reading than in later reading. According to the Messick principle, therefore, reading tests after grade 4 had deleterious effects and cannot have been valid tests. How can we make these reading tests more valid?

A good answer to that question will help determine the future progress of American education. Tune in.

If He’s So Important, Why Haven’t We Heard of Him?

by E. D. Hirsch, Jr.
January 9th, 2013

In Praise of Samuel Messick 1931–1998

Everyone who is anyone in the field of testing actually has heard of Samuel Messick.  The American Psychological Association has instituted a prestigious annual scientific award in his name, honouring his important work in the theory of test validity.   I want to devote this, my first-ever blog post, to one of his seminal insights about testing.   It’s arguable that his insight is critical for the future effectiveness of American education.

My logic goes this way:   Every knowledgeable teacher and policy maker knows that tests, not standards, have the greater influence on what principals and teachers do in the classroom.   My colleagues in Massachusetts—the state that has the most effective tests and standards—assure me that it’s the demanding, content-rich MCAS tests that determine what is taught in the schools.  How could it be otherwise?  The tests determine whether a student graduates or whether a school gets a high ranking.  The standards do vaguely guide the contents of the tests, but the tests are the de facto standards.

It has been and will continue to be a lively blog topic to argue the pros and cons of the new Common Core State Standards in English Language Arts.    But so far these arguments are more theological than empirical, since any number of future curricula—some good, some less so—can fulfill the requirements of the standards.   I’m sure the debates over these not-yet-existent curricula will continue; so it won’t be spoiling anyone’s fun, if I observe that these heated debates bear a resemblance to what was called in the Middle Ages the Odium Theologicum over unseen and unknown entities.   Ultimately these arguments will need to get tied down to tests.   Tests will decide the actual educational effects of the Common Core Standards.

But Samuel Messick has enunciated some key principles that will need to be heeded by everyone involved in them if our schools are to improve in quality and equity—not only in the forty-plus states that have agreed to use the common core standards—but also in those states that have not.   In all fifty states, tests will continue to determine classroom practice and hence the future effectiveness of American education.

In this post, I’ll sketch out one of Messick’s insights about test validity.   In a second post, I’ll show how ignoring those insights has had deleterious effects in the era of NCLB.  And in a third, and last on this topic, I’ll suggest policy principles to avoid ignoring the scientific acumen and practical wisdom of Samuel Messick in the era of the Common Core Standards.

 ******

Messick’s most distinctive observation shook up the testing world, and still does.  He said that it was not a sufficient validation of a test to show that it exhibits “construct validity.”    This term of art means that the test really does accurately estimate what it claims to estimate.   No, said Messick, that is a purely technical criterion.   Accurate estimates are not the only or chief function of tests in a society,   In fact, accurate estimates can have unintended negative effects.   In the world of work they can unfairly exclude people from jobs that they are well suited to perform.  In the schools “valid” tests may actually cause a decline in the achievement being tested – a paradoxical outcome that I will stress in the three blogs devoted to Messick.

Messick called this real-world attribute of tests “consequential validity.”    He proposed that test validity be conceived as a unitary quality comprising both construct validity and consequential validity—both the technical and the ethical-social dimension.   What shall it profit a test if it reaches an accurate conclusion yet injures the social goal it was trying to serve?

Many years ago I experienced the force of Messick’s observation before I knew that he was the source of it.    It was in the early 1980s, and I had published a book on the valid testing of student writing. (The Philosophy of Compsition).   At the time, Messick was the chief scientist at the Educational Testing Service, and under him a definitive study had been conducted to determine the most valid way to measure a person’s writing ability.   Actual scoring of writing samples was notoriously inconsistent, and hence unfair.   Even when graded by specially socialized groups of readers (the current system) there was a good deal of variance in the scoring.

ETS devised a test that probed writing ability less directly and far more reliably.   It consisted of a few multiple-choice items concerned with general vocabulary and editorial acumen.     This test proved to be not only far shorter and cheaper, it was also more reliable and valid.    That is, it better predicted elaborately determined expert judgment of writing ability than did the writing samples.

There was just one trouble with this newly devised test.  Used over time, student writing ability began to decline.   The most plausible explanation was that although the test had construct validity it lacked consequential validity.   It accurately predicted writing skill, but it encouraged classroom activity which diminished writing skill—a perfect illustration of Messick’s insight.

Under his intellectual influence there is now, again, an actual writing sample to be found on the verbal SAT.   The purely indirect test which dispensed with that writing sample had had the unfortunate consequence of reducing the amount of student writing assigned in the schools, and hence reducing the writing abilities of students.  A shame: the earlier test was not just more accurately predictive as an estimate, it was fairer, shorter, and cheaper.  But ETS has made the right decision to value consequential validity above accuracy and elegance.

Next time: Consequential Validity and the Era of No Child Left Behind

It’s Not You, It’s Me. I Think You’re an Idiot.

by Robert Pondiscio
October 2nd, 2012

Anyone who has grown overtired of dead-end education debates needs to read Dan Willingham’s latest blog post, which points out that when debate devolves into mere taunting and questioning the motives of your intellectual opponents, the audience you’re most trying to reach–the unpersuaded and undecided–tune out. “In education policy, some of us have gone too far,” Dan writes.  “People who disagree with us are depicted as not merely wrong, but evil.”

“People who advocate reforms such as merit pay, the use of value added models of teacher evaluation, charter schools, and vouchers are not merely labeled misguided because these reforms won’t work. They are depicted as bad people who are unsympathetic to the difficulty of teaching and who are in the pockets of the rich.

“Likewise, those who see value in teacher’s unions, who are leery of current methods of teacher evaluation, who think that vouchers threaten the neighborhood character of schools are not merely wrong: they are accused of looking out for the welfare of lousy teachers.

“And of course both sides are accused of ‘not caring about kids.’”

Willingham, as is his wont, cites studies bolstering what you might intuit by watching—or heavens forfend, participating in–these dispiriting wars of words:  partisans tend to believe they know what people in the other side of an issue are thinking and how they would behave.  The bottom line: “We think that people who agree with us are moral, and people who disagree with us, less so. Further, we think that we know how other people will interpret complicated situations–they will driven more by ideology than by facts,” Dan writes.

He concludes with a call for “fewer blog postings that, implicitly or explicitly,  denigrate the other person’s motives, or that offer a knowing nod with the claim ‘we all know what those people think.’” That call is all but certain to be more honored in the breach than the observance.

We can’t help ourselves.  And besides the other guys really ARE  evil.  Me? I just want what’s best for kids.

 

 

 

 

 

Reading is Believing (And That’s a Problem)

by Robert Pondiscio
August 30th, 2012

When planning class read-alouds as a teacher, I was an unabashed fan of historical fiction.  Christopher Paul Curtis’ Depression-era novel, Bud, Not Buddy; Lois Lowry’s Number the Stars, set in Nazi-occupied Denmark; and the 19th century frontier novel Sarah, Plain and Tall were among the books that allowed me to weave history and geography—sorely needed by my inner city 5th graders– into the literacy block.

With Common Core State Standards calling for more non-fiction in literacy instruction, mixing more academic content into ELA instruction is becoming standard practice.  But not everyone is eager to see fiction and literature loosen its grip on language arts.  Dan Willingham’s science and education blog asks, can’t kids learn about the world through fiction?

They can and do.

“The advantage of fiction is that the narrative can engage students, transport them into the story. The fear is that readers will assume that information in fiction is true, whereas fiction may well contain inaccuracies. We don’t expect fiction to be vetted for accuracy the way a non-fiction source would be. (Certainly Hollywood movies are notorious for playing fast-and-loose with the truth.)”

Research shows inaccuracies in fiction can indeed later be remembered by students as true.  Willingham describes an experiment designed to test whether exposure to accurate or inaccurate information in a fictional story influenced how students responded to a later test about that information.  Exposure to correct information “makes it more likely you’ll get the answer correct on the test,” Willingham writes. “Reading the misleading information makes it less likely you’ll get it correct and more likely you’ll get it wrong.”

Sounds obvious, but there’s more.  “Prior knowledge is not protective. In other words, the misleading information has an impact even for stuff that most of the students knew before the experiment started,” (emphasis added) Willingham observes.   Encountering inaccuracies in fiction, in other words, can override what students knew before they read it.  But all is not lost: alerting students to the specific inaccuracies or misinformation in a story, Dan notes, “is very effective in preventing subjects from absorbing the inaccuracy.”

The takeaway for teachers?  Use fiction to engage and bring history, science and other subjects to life.  But you’ve got know your stuff so you can flag instances of literary license to your kids.

Report: U.S. Needs More “Exam Schools”

by Robert Pondiscio
July 31st, 2012

If selective admissions high schools, such as New York City’s Stuyvesant High School, Boston Latin, and Thomas Jefferson in Fairfax County, Virginia are “hothouses for incubating a disproportionate share of tomorrow’s leaders in science, technology, entrepreneurship, and other sectors that bear on society’s long-term prosperity and well-being,  say Checker Finn and Jessica Hockett in a report in Education Next.  “We’d be better off as a country if we had more of them.”

Such schools, the pair say are a “unique and little-understood sector of the education landscape.”  As a group, the schools are “more racially diverse than is widely believed.”  Most of such schools’ teachers belong to unions and are paid accordingly.  Not surprisingly, nearly all of the 165 selective schools identified and surveyed by Finn and Hockett offer Advanced Placement (AP) courses or the International Baccalaureate (IB) program.  “With rare exceptions (mainly in Louisiana), however, the schools are not charters,” they report.  “Although they’re ‘schools of choice,’ they are operated in more top-down fashion by districts, states, or sometimes universities rather than as freestanding and self-propelled institutions under their states’ charter laws.”

Yes, but are they any good?  By admitting high achieving students, exam schools are front loading high performance. Finn and Hockett are clear-eyed:  “Much like private schools, which are more apt to trade on their reputations and college-placement records than on hard evidence of what students learn in their classrooms, the schools on our list generally don’t know—in any rigorous, formal sense—how much their students learn or how much difference the school itself makes,” they write. “As one puzzled principal put it, ‘Do the kids do well because of us or in spite of us? We’re not sure.’”

The research base on selective schools’ performance is surprisingly thin (the report cites two studies).  Finn and Hockett note “the burden is shifting to the schools and their supporters to measure and make public whatever academic benefit they do bestow on their students versus what similar young people learn in other settings.”  But the “marketplace signals” are clear. “Far more youngsters want to attend these schools than they can accommodate,” the pair report.  Moreover, selective schools provide an essential and largely overlooked function:

“It’s evident from multiple studies that our K–12 education system overall is doing a mediocre job of serving its ‘gifted and talented’ youngsters and is paying too little attention to creating appealing and viable opportunities for advanced learning. What policymakers have seen as more urgent needs (for basic literacy, adequate teachers, sufficient skills to earn a living, for example) have generally prevailed. The argument for across-the-board talent development has been trumped by ‘closing the achievement gap’ and focusing on test scores at the low end.”

“A major push to strengthen the cultivation of future leaders is overdue, and any such push should include careful attention to the ‘whole school’ model,” conclude Finn and Hockett

Constructivizing STEM

by Robert Pondiscio
February 22nd, 2012

The following guest post is by Katharine Beals, who blogs about education at Out in Left Field, where this post also appears.  — rp.

It’s hard not to detect a certain worry among those who write STEM articles for Education Week that the drive to educate students for careers in Science, Technology, Engineering, and Mathematics might include a drive to increase core scientific and mathematical content at the expense of things that Constructivists hold dear. Things, for example, like “model building,” “data analysis,” and “communicating findings.”

These are what Jean Moon and Susan Rundell Singer, in their backpage Edweek Commentary on Bringing STEM into Focus, want to be sure schools are focusing on:

Re-visioning school science around science and engineering practices, such as model-building, data analysis, and evidence-based reasoning, is a transformative step, a step found in the NRC report, which is critical to STEM learners and teachers, both K-12 and postsecondary. It puts forward the message that knowledge-building practices found under the STEM umbrella are practices frequently held in common by STEM professionals across the disciplines as they investigate, model, communicate, and explain the natural and designed world.

Not that this is all that Moon and Singer care about. They also care about big ideas, which they divide into two categories: “crosscutting concepts (major ideas that cut across disciplines)”, and “disciplinary core ideas (ideas with major explanatory power across science and engineering disciplines.” The former include “scale, proportion, and “quantity or the use of patterns;” the authors don’t cite any examples of the latter.

Besides “practices” and ”ideas,” the authors mention “strategies” and “tools” (again, without specific examples). What they don’t mention is underlying content, except to say:

Lest some believe this is setting up another false dichotomy in science or mathematics education between content and process, let us quickly add a strong evidentiary note: Epistemic practices and the learning and knowledge produced through such practices as building models, arguing from evidence, and communicating findings increase the likelihood that students will learn the ideas of science or engineering and mathematics at a deeper, more enduring level than otherwise would be the case. Research evidence consistently supports this assertion.

I’m curious what “research evidence” means, but I gather that it doesn’t include the research evidence that cognitive scientist Dan Willingham cites in support of the idea that students aren’t little scientists and need a foundation of years of core knowledge before being ready to function as actual scientists.

In promoting their ideas as “transformative,” the authors are overlooking the fact that the kinds of constructivist practices they desire are already standard in many schools (particularly those held up as models for others). If they want to promote something truly transformative for STEM, they should instead be advocating a reinstatement of the years of solid, content-based instruction in math and science that many of our K12 schools used to offer (and that one still finds in schools in most developed countries around the world).

Katharine Beals, PhD is the author of Raising a Left-Brain Child in a Right-Brain World: Strategies for Helping Bright, Quirky, Socially Awkward Children to Thrive at Home and at School. She teaches at the University of Pennsylvania Graduate School of Education and at the Drexel University School of Education, specializing in the education of children on the autistic spectrum. She blogs about education at Kitchen Table Math and on her own blog, Out in Left Field.