Valid, Reliable, and Unfair

by Lisa Hansel
August 4th, 2015

As schools across the country anxiously await the results of their new Common Core–aligned assessments, there’s one thing I wish all policy makers understood: The reading comprehension tests are valid, reliable, and unfair.

Standards-based assessments mean very different things in reading and math. The math standards include mathematics content—they clearly specify what math knowledge and skills students are supposed to master in each grade. That is not true in reading. The English language arts and literacy standards only specify the skills students are to master. They implore schools to build broad knowledge, but other than a few foundational texts in high school, they don’t indicate what knowledge students need to learn.

In brief, reading comprehension tests primarily assess decoding, fluency, vocabulary, and knowledge. A child who answers a question wrong might be struggling with decoding, or might be a fluent reader who lacks knowledge of the topic in the passage.

Reading comprehension is widely misunderstood as a skill that depends on applying strategies like finding the main idea (assuming fluent decoding). Cognitive science (and common sense) has established that comprehension actually depends on knowledge and vocabulary. If you know about dinosaurs, you can read about them. If you don’t know about a topic and haven’t learned vocabulary related to that topic, you will have to learn about it before comprehending a text on it (e.g., “Chirality plays a fundamental part in the activity of biological molecules and broad classes of chemical reactions, but detecting and quantifying it remains challenging. The spectroscopic methods of choice are usually circular dichroism…”).

If the standards specified what topics children should read about in each grade—and thus what topics may appear in the passages on the reading comprehension tests—then aligned assessments would be better measures of both how the students are progressing and the quality of the instruction they received. Because the standards offer no indication of which topics ought to be studied and thus no indication of which topics might be tested, the assessments are very blunt measures of students’ progress and teachers’ abilities. They are valid and reliable—they do indicate students’ general reading comprehension ability—but they conflate what’s been learned inside and outside school. They’re unfair.

That’s why reading comprehension scores are so strongly correlated with socioeconomic status and so difficult to improve. Comprehension depends on knowledge and vocabulary, but the topics on the test are unpredictable. So, the only way to be well prepared is to have very broad knowledge and a massive vocabulary. From birth, some children are in vocabulary- and knowledge-rich homes, while others are not. Making matters worse, only some children have access to high-quality early childhood education programs and K–12 schools.

Life is unfair, but these tests need not be. States could specify what topics are to be taught across subject areas in each grade and they could mandate that the passages on the reading comprehension assessments draw from those specified topics. In short, states could work toward knowledge equality.

shutterstock_259320548

Teach broad knowledge and test what’s been taught. Is that really too much to ask? (Image courtesy of Shutterstock.)

No Progress on Accountability, No Hope for Equity

by Lisa Hansel
April 7th, 2015

I try not to give in to despair, but in reading recent recommendations for the reauthorization of ESEA, I see America wasting another 50 years on unproductive reforms.

James S. Coleman said schools matter a great deal for poor kids, but we focus on the factors outside of school mattering more. A Nation At Risk warned of rigor’s disappearance, but we continue to pursue content-light strategies instead of content-heavy subjects. High-performing nations demonstrate that a national core curriculum (that specifies knowledge, not mere skills) enables improvement in everything from teacher preparation to student learning and assessment, but we refuse to do the hard work of selecting a core of knowledge for all our students. Our last decade under No Child Left Behind has shown that reading tests without a definite curriculum are counterproductive, but here we go again.

It was with high hopes that I began reading “Accountability and the Federal Role: A Third Way on ESEA.” A consensus document by Linda Darling-Hammond of the Stanford Center for Opportunity Policy in Education and Paul T. Hill of the Center on Reinventing Public Education, this third way makes important points about the need for assessment and accountability to stay focused on closing the achievement gap—and the need for flexibility in demonstrating student and school progress.

In particular, there are two points of agreement that I find very heartening:

Parents and the public need to know whether children are learning what they need to graduate high school, enter and complete four-year college, or get a rewarding, career-ladder job….

Because a student’s level and pace of learning in any one year depend in part on what was learned previously and on the efforts of many professionals working together, the consequences of high and low performance should attach to whole schools, rather than to individual educators.

Here we have two essential points: there are specific things that children need to know and these specific things build year to year. I actually became hopeful that this consensus document would take the next logical step and call for a content-specific, grade-by-grade, well-rounded curriculum. That’s the only thing that would make it clear if “children are learning what they need” and that would enable professionals to work together to build knowledge across grades.

But my hopes were short lived. The consensus document retreated to politically safe, educationally useless ground: “Because what children need to know evolves with knowledge, technology, and economic demands, an accountability system must encourage high performance and continuous improvement.” Later they actually call for “rich subject matter assessments,” but then undermine the idea by ignoring curriculum and, once again, retreating: “Because science, technology, and the economy are constantly shifting, the measures and standards used to assess schools must be continuously updated to reflect new content and valued skills.”

I hear all the time that information is growing at a shocking rate, and that today’s knowledge will be out of date before students graduate. Obviously, students don’t need knowledge, they need to learn how to find knowledge.

Please people! “Information” is only growing with lightning speed if you count the cat videos being loaded onto YouTube. There is amazing research being done—but very little of it affects elementary and secondary education, or college, career, and citizenship. In a terrific new book, Urban Myths about Learning and Education, Pedro De Bruyckere, Paul A. Kirschner, and Casper D. Hulshof tackle this silliness:

To name just a few things that we learned when we were children: the Pythagorean theorem still holds true…, as does the gravitational constant and the acceleration of a falling body on Earth…, there are still seven continents…, the Norman conquest of England took place in 1066, and a limerick has five lines and a sonnet fourteen. The fact is that much or most of what has passed for knowledge in previous generations is still valid and useful.

 

shutterstock_202666576

According to Urban Myths, a former Google executive said, “Between the birth of the world and 2003, there were five exabytes … of information created. We [now] create five exabytes every two days.” (Informational image courtesy of Shutterstock.)

 

What Darling-Hammond and Hill should have written is this: Because cognitive science shows that broad knowledge is essential to meet technology, economic, and citizenship demands, an accountability system must encourage a content-specific, well-rounded curriculum that inspires high performance and continuous improvement by testing what has been taught and thus providing data that teachers can actually use to inform instruction.

Darling-Hammond and Hill are thought leaders in the education arena. They know that skills depend on knowledge, and they know that there is a body of knowledge—from the Constitution to the Pythagorean theorem—that could form a core curriculum for the United States. In their third way, they are being politically realistic. And I am falling into despair.

Our kids don’t need more political pragmatism. They need excellence and equity. They need leaders to ensure that all children get an equal opportunity to learn “what they need to graduate high school, enter and complete four-year college, or get a rewarding, career-ladder job.”

For yet more evidence that political pragmatism isn’t working, check out the latest NAEP report, which shows almost no meaningful growth in vocabulary. Vocabulary is a proxy for knowledge and critical to comprehension. As E. D. Hirsch has explained, vocabulary is the key to upward mobility. Cognitive science and common sense have given us a clear path forward: build knowledge and skills together with a content-specific, grade-by-grade, well-rounded curriculum. Let’s not waste another 50 years. It will be incredibly hard for Americans to agree on a core curriculum. But nothing else will work.

Raising Readers—Not Test Takers

by Lisa Hansel
March 18th, 2015

In recent months, Teach Plus had over 1,000 teachers review sample items from PARCC, one of the two testing consortia trying to create assessments aligned to the Common Core standards.

I say “trying” because in reading, the task is pretty much impossible. The standards specify things students should be able to do, but they contain almost no content. Thankfully, they do call for content-rich curriculum and explain that comprehension depends on broad knowledge, but they don’t provide the content-specificity needed to guide instruction or assessment.

Thousands of different curricula and assessments could be aligned to the standards, which would be fine if teachers were trusted to develop both. But teachers are not allowed to create the assessments—at least the ones that count. So it is entirely possible for a teacher to develop an “aligned” curriculum that does not prepare students for the content that shows up on the “aligned” assessment.

The result is an unfair assessment.

Test developers acknowledge as much, creating guidelines for item development that minimize knowledge as a source of “bias.”

Well, the 1,000 teachers who just reviewed PARCC think the stripping of knowledge did not go far enough:

Nearly all participants found that the PARCC passages were better quality than the passages in state tests, as they are previously published pieces (indicating that they are complex and demonstrate expertise in nonfiction). However, there was some concern students did not have “background knowledge, nor the vocabulary to understand” vocabulary within the texts. Their comments suggest that to assess students as accurately as possible, some portions may need to be edited for diverse learners, or those with limited background knowledge of certain content areas.

I understand why teachers would call for reducing the prior knowledge demands of the test—they are stuck in this crazy world of being measured with content that no one told them to teach. But let’s be honest: reducing the knowledge demand makes the test a little fairer; it does not make the education students are getting any better.

The knowledge bias can’t be avoided with tests that are not explicitly aligned to the curriculum. Without a curriculum that specifies what has been taught—and therefore what it is fair to expect students to know—test writers are reduced to a narrow band of banal topics (but even “Jenny goes to the market” demands some prior, unequally distributed knowledge).

The less the knowledge bias, the less the test reflects real-world comprehension. Outside testlandia, comprehension is not isolated from knowledge. An adult who can’t comprehend a newspaper is not considered literate. Broad knowledge is inherent in literacy. If we care about reading, as opposed to testing, we shouldn’t be creating tests that minimize knowledge demands. We should be developing a coherent instruction, assessment, and accountability system that builds broad knowledge and is fair because it tests what is taught.

Clearly, our nation’s policymakers need a crash course in reading. Once they understand that there is no such thing as general comprehension ability, maybe they’ll stop trying to hold schools accountable for developing it.

Fortunately, a great crash course is now available: Daniel Willingham’s latest book, Raising Kids Who Read: What Parents and Teachers Can Do. If policymakers read between the lines, they’ll see an awful lot they can do too.

As with Willingham’s previous books, this one is engaging, easy to read, and super informative. Here’s just a taste:

Most parents want their children to be solid general readers. They aren’t worried about their kids reading professional journals for butterfly collectors, but they expect their kids to be able to read the New York Times, National Geographic, or other materials written for the thoughtful layperson. A writer for the New York Times will not assume deep knowledge about postage stamps, or African geography, or Elizabethan playwrights— but she will assume some knowledge about each. To be a good general reader, your child needs knowledge of the world that’s a million miles wide and an inch deep—wide enough to recognize the titles The Jew of Malta and The Merchant of Venice, for example, but not that the former may have inspired the latter. Enough to know that rare stamps can be very valuable, but not the going price of the rare Inverted Jenny stamp of 1918.

If being a “good reader” actually means “knowing a little bit about a lot of stuff,” then reading tests don’t work quite the way most people think they do. Reading tests purport to measure a student’s ability to read, and “ability to read” sounds like a general skill. Once I know your ability to read, I ought to be able (roughly) to predict your comprehension of any text I hand you. But I’ve just said that reading comprehension depends heavily on how much you happen to know about the topic of the text , because that determines your ability to make up for the information the writer felt free to omit. Perhaps, then, reading comprehension tests are really knowledge tests in disguise.

There is reason to think that’s true. In one study, researchers measured the reading ability of eleventh graders with a standard reading test and also administered tests of what they called “cultural literacy”—students’ knowledge of mainstream culture. There were tests of the names of artists, entertainers, military leaders, musicians, philosophers, and scientists, as well as separate tests of factual knowledge of science, history, and literature. The researchers found robust correlations between scores on the reading test and scores on the various cultural literacy tests—correlations between 0.55 and 0.90.

If we are to increase reading ability, policymakers will have to accept that it takes many years to develop the breadth of knowledge needed for tests that are not based on a specific curriculum. We shouldn’t be stripping the knowledge demands out of our tests; we should be stripping the unreasonable mandates from our accountability policies. If we all focused on raising readers, we would spend far less time on testing and far more on building broad knowledge.

shutterstock_202443349

Young reader, building knowledge and comprehension, courtesy of Shutterstock.

Killing Three Birds with One Stone

by Lisa Hansel
November 4th, 2014

The Fordham Institute’s Aaron Churchill has an interesting new post weighing the merits of state-mandated testing in science and social studies. He notes the cons—like the minimal added information on school quality given the high correlations between scores on science and reading tests—and the pros—like reversing the narrowing of the curriculum driven by the high-stakes emphasis on reading and math. Then he sets forth four options (and ultimately recommends his third option):

1.) Keep the status quo. This would ensure that social studies and science are tested, but in non-consecutive years (e.g., science in grades 5 and 8). Yet the status quo still does not compel schools to treat these subjects as equal partners with ELA and math.

2.) Eliminate testing in social studies and science. This approach would reduce the cost of testing in these areas, which gives us little new information about student achievement for school-quality purposes. However, this option would likely encourage even more focus on ELA and math and would require a waiver from federal statute which presently requires science testing at least once in elementary, middle, and high school.

3.) Increase testing in social studies and science to the same frequency as math and ELA (i.e., test these subjects annually in grades 3-8). This would balance schools’ incentives to treat each subject equally, but at the cost of more time and money. From an information perspective, although little additional information is yielded in terms of student proficiency, annual testing could help analysts construct growth (i.e., “value-added”) measures for these subjects.

4.) Decrease testing in math and ELA to non-consecutive grades to match the frequency of social studies and science (e.g., test math and ELA in grades 4 and 6, not consecutively in grades 3-8). This would also balance schools’ incentives to treat subjects equally, but at the cost of less information and accountability. It would also require federal action to grant Ohio relief from consecutive-year-testing mandates in math and ELA in grades 3–8, or more likely, a rewritten federal law that governs state accountability (No Child Left Behind).

I’d like to offer a fifth option that assesses science and social studies yet has fewer tests: Draw the topics for the reading comprehension tests from the science and social studies standards. This blog recently explored the many drawbacks of current reading comprehension tests. In short, they contain a random smattering of “common” topics and topics that ought to be taught in school, but since they are not tied to any specific content that we can be certain has been taught, they inevitably privilege students who have acquired broad knowledge (usually at home).

The only way to construct truly fair reading comprehension tests is to ensure that the passages are on topics that have been taught in school. Since states’ English language arts standards usually do not specify which books, poems, short stories, etc. to teach in each grade, ELA standards are a poor guide for test developers concerned with equity. But states’ science and social studies standards usually do specify some core content to be taught in each grade. The obvious path forward is to construct reading comprehension tests that assess language arts skills using the science and social studies content specified in the standards. After all, skills depend on relevant prior knowledge, so such tests would give a more accurate picture of schools’ impact on students’ language abilities than our current random-content tests. And for the cost and time of just one test, we would have a decent gauge of three subjects.

Even better would be to draw the topics for passages on reading comprehension tests from science, social studies, art, music, geography, and civics standards. Such tests would (1) induce schools to develop a broad, content-rich curriculum and support teacher collaboration, (2) reduce the impact of the home on students’ scores, (3) build the knowledge and vocabulary that is essential to literacy, and (4) be the foundation for an accountability system that requires fewer tests yet still ensures that standards are being met.

shutterstock_156369128

Reading tests with science and social studies content that had been taught would be more equitable and more interesting. Image courtesy of Shutterstock.

Smarter Balanced Confuses Fairness and Validity

by Lisa Hansel
October 15th, 2014

Over the past two weeks, we’ve looked the ETS guidelines for fair assessments that PARCC adopted, as well as a sample item from PARCC. Now let’s turn to the “Bias and Sensitivity Guidelines” ETS developed for Smarter Balanced. While I can’t say that ETS’s guidelines for Smarter Balanced contradict those adopted by PARCC, they are different.

In the introduction, validity and fairness are equated: “if an item were intended to measure the ability to comprehend a reading passage in English, score differences between groups based on real differences in comprehension of English would be valid and, therefore, fair…. Fairness does not require that all groups have the same average scores. Fairness requires any existing differences in scores to be valid” (p. 6).

By this logic, since youth from higher-income homes, on average, have more academic and common knowledge than youth from lower-income homes, the test that conflates reading comprehension ability with opportunity to learn is perfectly fair. Valid I can agree with. Fair I cannot.

A couple pages later, further explanation is offered (p. 8):

Exposure to information

Stimuli for English language arts items have to be about some topic…. Which topics and contexts are fair to include in the Smarter Balanced assessments? One fairness concern is that students differ in exposure to information through their life experiences outside of school. For example, some students experience snow every winter, and some have never experienced snow. Some students swim in the ocean every summer, and some have never seen an ocean. Some students live in houses, some live in apartments, some live in mobile homes, and some are homeless.

Even though curricula differ, the concepts to which students are exposed in school tend to be much more similar than are their life experiences outside of school. If students have become familiar with concepts through exposure to them in the classroom, the use of those concepts as topics and contexts in test materials is fair, even if some students have not been exposed to the concepts through their life experiences. For example, a student in grade 4 should know what an ocean is through classroom exposure to the concept, even if he or she has never actually seen an ocean. A student does not have to live in a house to know what a house is, if there has been classroom exposure to the term. Similarly, a student does not have to be able to run in a race to know what a race is. Mention of snow does not make an item unacceptable for students living in warmer parts of the country if they have been exposed to the concept of snow in school.

Let’s pause here: “Even though curricula differ, the concepts to which students are exposed in school tend to be much more similar than are their life experiences outside of school.” Maybe. Maybe not.

It might be the case that all elementary schools teach snow, oceans, houses, races, and deserts. But does Smarter Balanced really test such banal topics? No. As far as I can tell from its sample items, practice tests, and activities for grades three to five, Smarter Balanced (like PARCC) tests a mix of common and not-so-common knowledge. Passages include Babe Ruth, recycling water in space, how gravity strengthens muscles, papermaking, the Tuskegee Airmen, tree frogs, murals, and much more.

The sample items strike me as comprehensible for third to fifth graders with broad knowledge, but I am highly skeptical that we can safely assume that children are acquiring such broad knowledge in their elementary schools.

As Ruth Wattenberg explained in “Complex Texts Require Complex Knowledge” (which was published in Fordham’s Knowledge at the Core: Don Hirsch, Core Knowledge, and the Future of the Common Core), students in the elementary grades have minimal opportunities to acquire knowledge in history and science. Reviews of basal readers in 1983 and 2003 revealed that they contained very little content. This would be a lost opportunity, not a serious problem, but for the fact that elementary schools tend to devote a substantial amounts of time to ELA instruction, and very little to social studies and science instruction. Wattenberg’s table (p. 35) should be shocking:

Grade and subject 1977 2000 2012
K–3 social studies 21 21 16
4–6 social studies 34 33 21
K–3 science 17 23 19
4–6 science 28 31 24

Even worse, Wattenberg found that “When elementary teachers were asked during what time period struggling students received extra instruction in ELA or math, 60 percent said that they were pulled from social studies class; 55 percent said from science class.”

In their home environments, the schools they attend, and the curriculum to which they are exposed, lower-income children do not have an equal opportunity to learn. As Smarter Balanced guidelines state, the assessment is fair “if students have become familiar with concepts through exposure to them in the classroom.” That’s a big if.

Making matters worse, Smarter Balanced (like PARCC) asserts that it’s just fine for some kids to have to learn during the test. Returning to the “Bias and Sensitivity Guidelines” (p. 8):

Information in the stimulus

A major purpose of reading is to learn about new things. Therefore, it is fair to include material that may be unfamiliar to students if the information necessary to answer the items is included in the tested material. For example, it is fair to test the ability of a student who has never been in a desert to comprehend an appropriate reading passage about a desert, as long as the information about deserts needed to respond to the items is found in the passage.

Last week, we explored how difficult it is to learn from one passage and how greatly such test items advantage students who already know the content that the passage is purportedly teaching. Smarter Balanced clearly disagrees with me. Here’s the introduction it its fourth grade Animal World activity:

The Classroom Activity introduces students to the context of a performance task, so they are not disadvantaged in demonstrating the skills the task intends to assess. Contextual elements include: an understanding of the setting or situation in which the task is placed, potentially unfamiliar concepts that are associated with the scenario; and key terms or vocabulary students will need to understand in order to meaningfully engage with and complete the performance task.

Please take a look at the activity—it assumes an enormous amount of knowledge. Even if it did not, the notion of learning and immediately demonstrating ability flies in the face of well-established research on human’s limited working memory capacity. There’s no getting around it: the students with relevant prior knowledge have a huge advantage.

One (sort of) positive note: I am cautiously optimistic that Smarter Balanced’s computer adaptive testing will help—a little. Here’s how it’s described:

Based on student responses, the computer program adjusts the difficulty of questions throughout the assessment. For example, a student who answers a question correctly will receive a more challenging item, while an incorrect answer generates an easier question. By adapting to the student as the assessment is taking place, these assessments present an individually tailored set of questions to each student and can quickly identify which skills students have mastered…. providing more accurate scores for all students across the full range of the achievement continuum.

In a hierarchical subject like math, the benefits of this adaptation are obvious. In reading, adaptation might help, but it might be misleading. Once a student has mastered decoding, what makes one passage “easier” to comprehend than another is driven primarily by the topic. If the student knows a lot about the topic, then factors like rare vocabulary (which isn’t rare to the reader with the relevant knowledge) and complex sentence structure are of little import. If a student does not know about the topic, then making the vocabulary and sentence structure easier will only help a little. The main way in which adaptive testing might be helpful is in varying the topics; “easier” passages would consist of more common topics, while more “challenging” passages would consist of less common, more academic topics. Then, if we examined the results carefully, we might see that a child lacks essential—teachable—academic knowledge.

Yet, I am only cautiously optimistic because the knowledge that drives reading comprehension is accumulated more haphazardly than hierarchically. One can have some academic knowledge while missing some common knowledge. A student whose grandparents lived most of their lives in Greece may know a great deal about ancient and modern Greece and be ready for a highly sophisticated passage comparing and contrasting ancient and modern Greece. That same student may have no knowledge of China, gravity, Harlem’s Jazz age, or other topics that might appear on the test. Without assessing topics that have been taught, I see no way to truly gauge a students’ comprehension ability (or what the teacher or school has added).

To reinforce the most important message—that comprehension depends on knowledge, and thus schools must systematically build knowledge—the tests need to be tied to the content taught or the high stakes need to be removed so schools will no longer take time out of regular instruction for test preparation.

PARCC Demonstrates the Benefits of Broad Knowledge

by Lisa Hansel
October 8th, 2014

Last week I explored the “ETS Guidelines for Fairness Review of Assessments.” These guidelines were adopted by PARCC, so I decided to take a look at PARCC’s sample items for English language arts. (PARCC is one of the two consortia of states with massive federal grants to create assessments aligned with the Common Core State Standards. Smarter Balanced is the other consortium; ETS developed somewhat different guidelines for it—I’ll take a look at those next week).

The knowledge demands in PARCC’s sample items are very broad, from cougars to Amelia Earhart to DNA testing. While I am happy to see some substantive questions—and hopeful that such test items will reinforce the standards’ call for systematically building knowledge with content-rich curriculum—I worry about the fairness of these assessments given how they are being used.

As I mentioned last week, it would be perfectly fair to have test passages on topics that had been taught. But, since the test developers do not know which topics are taught in each grade, they have to assess “common” knowledge. Due to well-documented differences in opportunities to learn at home and at school, some children know a good bit more common knowledge than others.

Let’s take a look at one of PARCC’s sample items for third grade. Three questions are asked based on the 631-word passage “How Animals Live.” There’s a typical main-idea question paired with a supporting-evidence question, and then a narrower question that assesses the “skills of rereading carefully to find specific information and of applying the understanding of a text.” Here’s the first section of the passage:

What All Animals Need

Almost all animals need water, food, oxygen, and shelter to live.

Animals get water from drinking or eating food. They get food by eating plants or other animals.

Animals get oxygen from air or water. Many land animals breathe with lungs. Many water animals breathe with gills.

Animals need shelter. Some animals find or build shelter. Other animals grow hard shells to protect themselves.

Many words here are undefined: oxygen, shelter, lungs, and gills. Are these words common to all third graders? Probably not, but much of the content is likely familiar to the vast majority of third graders—and perhaps enough content is familiar for most third graders to grasp the section (if not every word). Nonetheless, children who have learned about oxygen, shelter, lungs, and gills start out with a big advantage. They are reading and comprehending more quickly (which is extremely important in a timed test), and they are comfortable as they move into the more difficult content in the rest of the passage.

Here is the second section, and the beginning of the third:

Ways Of Grouping Animals

Animals can be grouped by their traits. A trait is the way an animal looks or acts. Animals get traits from their parents. Traits can be used to group animals.

Animals with Backbones

Animals with backbones belong to one group. A vertebrate is an animal with a backbone. Vertebrates’ backbones grow as they get older. Fish, snakes, and cats are all vertebrates. Vertebrates can look very different.

Let’s ignore the stiff, unengaging style. What really concerns me is the delusion that it is fair for content to be learned and applied during a high-stakes assessment. (As I noted last week, I do not dispute that the assessment is valid and reliable, so my concerns are with accountability policies, not really with this type of assessment.)

Since a definition of trait is given, it’s clear that some significant portion of third graders is not expected to know that word. Now imagine this is the first time you’ve encountered trait and examine the text:

A trait is the way an animal looks or acts…. Traits can be used to group animals…. Vertebrates can look very different.

What is a third grader to make of this? Clearly, vertebrates are not grouped by how they look.

A trait is the way an animal looks or acts…. Fish, snakes, and cats are all vertebrates.

Clearly, vertebrates are not grouped by how they act. How is the backbone (which is not defined) a trait, since it does not seem at all related to how all these animals look or act?

Making matters worse, understanding trait is essential to correctly answering the main-idea and supporting-evidence questions.

shutterstock_57834808

Do these vertebrates look or act alike?
(Image courtesy of Shutterstock.)

The fact is, vocabulary is not learned by being given a definition. Definitions can be helpful, but they are always incomplete. Words are learned through multiple exposures in multiple contexts. Even with simple words, multiple contexts are necessary: What, exactly, makes hoagies and gyros and PB&Js all sandwiches? I can’t even attempt a concise answer—I just know a sandwich when I see one.

Third graders who have had a unit on vertebrates and invertebrates will breeze through this passage; its inadequate definition of trait won’t matter. But students relying on this definition will surely be at least a little confused, possibly totally lost. The assessment will accurately tell us that children without knowledge of traits have limited comprehension of this passage—but it will not accurately tell us anything about their teachers or schools, for no one alerted the educators that the test would measure knowledge of traits.

Reading Test Developers Call Knowledge a Source of Bias

by Lisa Hansel
October 1st, 2014

You might expect to see a headline like this in the Onion, but you won’t. The Onion can’t run it because it isn’t just ironic—it’s 100% true.

A few years ago, a researcher at one of the big testing companies told me that when developing a reading comprehension test, knowledge is a source of bias. He did not mean the obvious stuff like knowledge of a yacht’s anemometer. He meant typical K–12 subject matter.

Since reading comprehension depends chiefly on knowledge of the topic (including the vocabulary) in the passage, the student with that knowledge has a large advantage over the student without it. And since there have always been great educational inequities in the United States, students’ knowledge—acquired both at home and at school—is very strongly correlated with socioeconomic status.

A logical solution would be to test reading comprehension using only those topics that students have been taught. Teachers can do this, but testing companies can’t—how would they have any idea what topics have been taught in each grade? It’s rare for districts, much less states, to indicate what or when specific books, people, ideas, and events should be taught.

Without a curriculum on which to base their assessments, testing companies have devised their own logic—which is sound given the bind they’re in. They distinguish between common and specialized knowledge, and then they select or write test passages that only have common knowledge. In essence, they’ve defined “reading comprehension skill” as including broad common knowledge. This is perfectly reasonable. When educators, parents, etc. think about reading comprehension ability, they do not think of the ability to read about trains or dolphins or lightning. They expect the ability to read about pretty much anything one encounters in daily life (including the news).

I already had this basic understanding, but still I found the “ETS Guidelines for Fairness Review of Assessments” eye opening. Guideline 1 is to “avoid cognitive sources of construct-irrelevant variance…. If construct-irrelevant knowledge or skill is required to answer an item and the knowledge or skill is not equally distributed across groups, then the fairness of the item is diminished” (p. 8). It continues, growing murkier:

Avoid unnecessarily difficult language. Use the most accessible level of language that is consistent with valid measurement…. Difficult words and language structures may be used if they are important for validity. For example, difficult words may be appropriate if the purpose of the test is to measure depth of general vocabulary or specialized terminology within a subject-matter area. It may be appropriate to use a difficult word if the word is defined in the test or its meaning is made clear by context. Complicated language structures may be appropriate if the purpose of the test is to measure the ability to read challenging material.

Avoid unnecessarily specialized vocabulary unless such vocabulary is important to the construct being assessed. What is considered unnecessarily specialized requires judgment. Take into account the maturity and educational level of the test takers in deciding which words are too specialized.

On page 10, it offers this handy table that “provides examples of common words that are generally acceptable and examples of specialized words that should be avoided…. The words are within several content areas known to be likely sources of construct-irrelevant knowledge”:

ETS table 1

Since having good reading comprehension means being able to read about a wide variety of common topics, table 1 seems just fine. But testing companies’ silence about what their reading comprehension tests actually measure is not. They say they are measuring “reading comprehension skill,” but their guidelines show that they are measuring a vaguely defined body of “common knowledge.”

Common words are not common to all. Even “common” knowledge is knowledge that must be taught, and right now—at home and at school—far too many children from low-income homes don’t have an opportunity to learn that knowledge (which is common to youth from middle-class and wealthy homes). That’s why reading comprehension scores are so strongly and stubbornly correlated with socioeconomic status.

These tests of “common” knowledge are accurate assessments and predictors of reading comprehension ability; but they are not fair or productive tests for holding children (and their teachers) accountable before an opportunity to learn has been provided.

If all testing companies would clearly explain that their reading comprehension tests are tests of knowledge, and if they would explain—as the ACT’s Chrys Dougherty does—that the only way to prepare for them is to build broad knowledge, then we could begin to create a fair and productive assessment and accountability system. Before the end of high school, all students should have broad enough knowledge to perform well on a reading comprehension test. But what about in third, fourth, or even seventh grade? In the early and middle grades, is a test drawn only from topics that have been taught in school the only fair way to test reading comprehension? How many years of systematically teaching “common” knowledge are needed before a reading comprehension test that is not tied to the curriculum is fair, especially for a student whose opportunities to learn outside of school are minimal?

The answer depends not so much on the test as on what is done with the scores. If we accepted the fact that reading comprehension depends on broad knowledge, we would radically alter our accountability policies. Scores on “common knowledge” reading comprehension tests would be recognized as useful indicators of where students are in their journey toward broad knowledge—they would not be mistaken for indicators of teaching quality or children’s capacity. Instead of holding schools accountable for scores on tests with content that is not tied to the curriculum, we would hold them accountable for creating a content-rich, comprehensive, well-sequenced curriculum and delivering it in a manner that ensures equal opportunity to learn. To narrow the inevitable gaps caused by differences in out-of-school experiences, we would dramatically increase free weekend and summer enrichment opportunities (for toddlers to teenagers) in lower-income neighborhoods. (We would also address a range of health-related disparities, but that’s a topic for another day.)

In sum, reading comprehension really does rely on having a great deal of common knowledge, so our current reading comprehension tests really are valid and reliable. To make them fair and productive, children from lower-income families must be given an equal opportunity to learn the knowledge that is “common” to children from higher-income homes.

shutterstock_200841857

Reading is always a test of knowledge (image courtesy of Shutterstock).

Testing: From the Mouths of Babes

by Lisa Hansel
May 8th, 2014

“No one learns from state tests. It’s testing what you know. You’re not learning anything from it.”

 —12th grader

“I like math or spelling tests better [than state accountability tests] because you can study for them. For the [state accountability tests], I wonder what will be on them this time.”

 —5th grader

“I like pre- and post-tests because you get to see the progress you’ve made.”

—4th grader

Is it just me, or do these kids know a whole lot more about assessment and increasing educational achievement than most state and national policymakers? Far too many policymakers seem to have lost sight of the most important goal of assessment and accountability: increasing learning. They seem stuck on accountability for the sake of accountability, unwilling to ask whether assessment dollars could be used more effectively.

I’m not against accountability—and I think assessment is necessary—but I am for allocating time and money in the most effective ways. So I find these students’ thoughts, and the new study in which they appear, pretty compelling. The study is Make Assessment Matter, by the Northwest Evaluation Association in cooperation with Grunwald Associates LLC. It explores students’ (4th – 12th graders), teachers’, and administrators’ views on all sorts of testing—from classroom quizzes to state accountability tests. Conclusion: “There is an urgency felt on the part of students, teachers and district administrators to emphasize assessment for learning rather than for accountability. The overwhelming preference for all parties is that assessment results be used to inform learning.” Sadly, today’s state tests not only don’t inform learning, they seem to be impeding it: “teachers (70 percent) and district administrators (55 percent) … [say] that the focus on state accountability tests takes too much time away from learning.”

Think about the weeks that are lost to state accountability tests each year as you absorb these key findings:

On the one hand, the vast majority of students, boys and girls, say they try hard on most tests and care about doing well on tests, among other findings that indicate how seriously they take tests and learning. On the other hand, some boys (46 percent) and girls (39 percent) say that tests are a waste of time.

It’s clear that students feel that certain kinds of tests are not very relevant to their learning, and so it’s not surprising to hear some students identify tests as a waste of time. In tandem with other findings, the message is clear: students want high-quality, engaging assessments that are tightly connected to learning….

Like students, teachers and district administrators would prefer to focus on tests that inform student learning. Most teachers (54 percent), and the vast majority of district administrators (89 percent), say that the ideal focus of assessments should be frequently tracking student performance and providing daily or weekly feedback in the classroom. This sentiment tracks with students’ attitudes about tests. Students express overwhelming agreement that tests are important for helping them and their teachers know if they are making progress in their learning and for understanding what they are learning.

Teachers say that teacher-developed classroom tests, performance tasks and formative assessment practice work best for supporting student learning in their classrooms, while state accountability tests are the least effective.

For an assessment to matter, it has to be directly tied to what is being studied in the classroom. For students to care about it, they need to be able to study for it and use the results in meaningful ways.

shutterstock_190861118

Image courtesy of Shutterstock.

That sounds perfectly reasonable to me. So, what are the logical implications for states? I see two options. One is to use Advanced Placement as a model: create detailed, content-specific courses and develop tests that only assess material in the course. I know it’s unheard of in state accountability testing, but I am actually being so crazy as to say that states should test students on the topics, books, people, ideas, events, etc. that they have been taught.

If state policymakers can’t stomach the idea of specifying the content to teach and test—if they can’t honor students’ desire to be tested in ways that inform learning—then they must honor students’ desire to not have their time wasted: make the tests zero stakes with zero test prep (like NAEP). Any test that is not tied to the specific content being studied in the classroom is a test of general knowledge and skills. Such a test can provide an informative snapshot of students’ and schools’ relative performance (and thus which schools and communities are in need of added supports). It can’t, however, indicate how any one student acquired her knowledge and skills (could be the teacher, the tutor that mom hired in October, the soccer coach who demands higher grades, the new librarian in town, finally being given eyeglasses, etc.). And therefore it can’t offer any precise indication of either teacher quality or how the student could improve. If a state wants to give a test that measures general abilities and provides nothing more than a snapshot and a trend line, that’s fine—provided the stakes and the prep time are minimized.

My preference, obviously, is for option one—especially if states would have the good sense to involve hundreds of educators in developing the specific content to be taught and assessed. Not only would the state-controlled, culminating test be useful for learning, in preparing for it teachers could use effective practices like frequent quizzing on essential content.

 

With Knowledge, You’re Never Lost

by Lisa Hansel
April 30th, 2014

Like many in the education world, I spend much of April and May wondering about U.S. testing and accountability policies. When you add up the time spent on benchmark testing, test-prep (especially the part devoted to item format, as opposed to reviewing essential knowledge and skills), testing, and grading, is that really a good use of our school days? Data to inform instruction and decision making are necessary, but do teachers, administrators, or even policymakers get what they need from the tests currently being given (or developed)? Is there a better way to improve educational outcomes?

Pretty much every article on testing this season has sent my mind spinning down these paths. I mean, really, who thought NCLB would still exist in 2014? We knew from the get-go that 100% proficient was a fairy tale. Can we please start thinking about education policy realistically? Can we be brave enough to offer clarity as to what we are trying to accomplish? With that clarity, can we find the fortitude to proceed patiently, rationally, and supportively so that (like more reasonable nations) we can attain our goals?

Probably not. But I’m not ready to give up.

Core Knowledge has a specific answer as to what it is trying to accomplish. So do all schools that have taken the time to create—and that make ongoing investments in enhancing—a detailed curriculum. Such schools should also be able to devise tests that support teaching and learning, as well as produce evidence of their educational accomplishments.

Logically, this is where I should explain the evidence that a Core Knowledge-based curriculum is effective. You can follow the link if you want. I’d rather offer an example. Call it an anecdote. Dismiss it. In a different frame of mind I might to the same. But in testing season, this example resonates with me. It’s a reminder that knowledge is inherently useful and valuable.

So, how do I know Core Knowledge works? Because it worked for an engaging lady from South Carolina who took the time to handwrite a three-page letter to E. D. Hirsch:

I am ninety-four years old and have wondered all my life about the answers to a few questions…. One being, if I am alone and lost in the woods and there is no sun and I have no compass, how do I know where North is?

shutterstock_62263246

Image courtesy of Shutterstock.

When she asked her librarian, she received a clear answer in What Your Second Grader Needs to Know, which is in the preschool through sixth-grade series Hirsch wrote for parents.

I was so impressed that I read the whole book, then asked the librarian for Book One. When finished I asked for Book Three, then Four, then Five and finished with Book Six. I was not so fortunate as the children of today, having been brought up during the Depression when I had to quit the Sophomore year of High School and go to work….

I have learned so much more from your books that my grade school didn’t teach me. I wish that all elementary schools were required to use Core Knowledge Series. Every subject was so enlightening, interesting, and helpful for me. Thank you.

How would this ninety-four-year-old student do on a standardized test, even one for fifth or sixth graders? I don’t know (she’d probably need several days of drilling on educationally irrelevant test-taking strategies). But I know she’ll never be lost in the woods. And—more importantly—I know that because she read the first through sixth grade books, she has the broad knowledge she needs to learn more each day. To grasp the scientific, political, economic, and arts topics covered in her chosen news source. To have lively debates with her neighbors. To be a well-informed voter. To show her great-grandchildren that knowledge is enlightening, interesting, and helpful.

Will The SAT Overhaul Help Achieve Equity?

by Guest Blogger
April 24th, 2014

By Burnie Bond

Burnie Bond is the director of programs at the Albert Shanker Institute. This post originally appeared on the Shanker Blog on April 22, 2014.

The College Board, the organization behind the SAT, acknowledges that historically its tests have been biased in favor of the children of wealthy, well-educated elites—those who live in the best zip codes, are surrounded by books, go to the best-regarded schools (both public and private), enjoy summer enrichment programs, and can avail themselves of as much tutoring and SAT test-prep coaching as they need. That’s why, early last month, College Board president David Coleman announced that the SAT would undergo significant changes, with the aim of making it more fair and equitable for disadvantaged students.

Among the key changes, which are expected to take effect in 2016, are: the democratization of access to test-prep courses (by trying to make them less necessary and entering into an agreement with the Khan Academy to offer free, online practice problems*); ensuring that every exam includes a reading passage from one of the nation’s “founding documents,” such as the Declaration of Independence or the Bill of Rights, or from one of the important discussions of such texts, such as the Rev. Dr. Martin Luther King Jr.’s “Letter From Birmingham Jail”; and replacing “arcane ‘SAT words’ (‘depreciatory,’ ‘membranous’),” with words that are more “commonly used in college courses, such as ‘synthesis’ and ‘empirical.’” (See here.)

Will this help? Well, maybe, but the SAT’s long heldbut always elusive—mission to help identify and reward merit, rather than just privilege, will only be met insofar as its creators can be sure that all students have had an equal opportunity to learn these particular vocabulary words and have read these particular “founding documents” and texts. That is, it comes down to a question of curriculum.

Curriculum and Equity

The connection between curriculum and equity first occurred to me when I was eight years old (though obviously not in those exact terms). For some reason, my school decided that all third graders needed to have an IQ test. I was sick that day, so one school holiday I found myself filling in bubbles alone in a classroom with Mrs. Beagles, the school’s assistant principal.

All was well until I got to one particular question. Since the test designers couldn’t be sure we could read well, many of the questions were in picture form. This one included a series of line drawings. As I recall, the first was a drawing of a boy in a ski jacket standing on a beach; the second showed a boy in swim trunks and a beach ball standing in the snow near a snowman; the third had the same swim-trunked, beach ball kid standing in sand near a big cactus; and the fourth had the ski jacket boy standing near the snowman. The question was: Which one didn’t belong?

Although I knew the “right” answer, I found myself wondering how they could just assume that I should. Having never left the tropical island of St. Croix, I had not yet been in winter or seen snow or a snowman. And, although some cactus varieties could be found out on the island’s East End, we had no real desert either. Our textbooks had not yet covered the relevant units on physical geography, and my book-loving father had only allowed a television into our house about six months beforehand. I then started wondering how many of my classmates might have thought that the ski jacket was some elaborate water flotation outfit, and how many would have been confused because we all regularly swam at East End beaches with cacti in plain sight. For that matter, what about kids on the mainland who grew up in cities or in the Midwest and who had never been to a beach or seen a desert?

Irate over the unfairness of it all, I complained to Mrs. Beagles, who replied, “Just do the best that you can,” and returned to grading papers.

I found myself thinking about this episode as I read a very interesting 2012 paper by Santelices and Wilson, whose research gave credence to an earlier paper by Freedle (also here)—the upshot of which is that the SAT Verbal continues to be biased against poor and minority students in a very particular way. That is, test takers who are African American, Hispanic-American, Asian American, or White from low-income households tend to do disproportionately well on the “hard” questions and disproportionately poorly on the “easy” ones.

In his 2003 Harvard Educational Review article, Freedle explains:

A culturally based interpretation helps explain why African American examinees (and other minorities) often do better on many hard verbal items but do worse than matched-ability Whites on many easy items. To begin with, easy analogy items tend to contain high-frequency vocabulary words while hard analogy items tend to contain low-frequency vocabulary words (Freedle & Kostin, 1997). For example, words such as “horse,” “snake,” “canoe,” and “golf” have appeared in several easy analogy items. These are words used frequently in everyday conversations. By contrast, words such as “vehemence,” “anathema,” “sycophant,” and “intractable” are words that have appeared in hard analogy items, and do not appear in everyday conversation (Berger, 1977). However, they are likely to occur in school-related contexts or in textbooks.

In other words, kids who are somewhat outside of the cultural mainstream do less well on items built around assumptions about common knowledge—the words and ideas that are “used frequently in everyday conversations.”  But what if your language or culture or social standing diminishes the chances that you actually engage in everyday conversations about golfing or canoes? In that case, it makes perfect sense to expect that you would do better on the “harder”—even the “arcane”—school-related items that are built around the words, ideas, and texts that you have actually been taught.

shutterstock_57833590

Photo courtesy of Shutterstock.

Or, put another way: Assessments of student learning are neither fair nor valid unless they measure only the content and skills that students have actually been given the opportunity to learn. And the only way to do that, of course, is to know what they have been taught—that is, in the presence of a defined curriculum.

The Problem with Curriculum

There are some very good reasons why the United States, unlike most of the world’s highest-performing nations, has avoided adopting a national curriculum for all of these years. As David K. Cohen has noted:

For school systems around the world, the infrastructure commonly includes student curricula or curriculum frameworks, exams to assess students’ learning of the curricula, instruction that centers on teaching that curriculum, and teacher education that aims to help prospective teachers learn how to teach the curricula. The U.S. has had no such common and unifying infrastructure for schools, owing in part to fragmented government (including local control) and traditions of weak state guidance about curriculum and teacher education.

Another huge issue is that “curriculum” has become a catch-all that describes everything from general performance standards all the way to student texts with scripted daily lesson plans. Thus, in any given discussion about the role of curriculum in a well functioning school system, it is very likely that the discussants are actually talking past each other. This has led to many unintentionally amusing statementsresponses and counter-responses, as each “side” tries to clarify what it and others are actually trying to promote and/or oppose.

In terms of equity concerns, I think that E. D. Hirsch has it exactly right. That is, we need to make sure that every American student—regardless of economic, geographic, racial or ethnic background—is provided with a “coherent, cumulative, and content-specific core curriculum” (see here, but also hereherehere and here).

As Hirsch uses the term, the “curriculum” should provide enough guidance to teachers to ensure that what is taught will prepare students for the learning that comes next, while remaining flexible enough for teachers (or schools or districts) to decide for themselves which specific materials and instructional approaches best meet the needs of any particular set of students. He uses the term “core” to mean both that which is most important, which should be taught in common to all students, as well as that which is foundational to the more personalized courses of study that students may choose for themselves during their high-school years. Thus, Hirsch’s Core Knowledge Sequence, which covers pre-K to 8th grade, could also be described as a curriculum framework or syllabus—a coherent “outline of the subjects in a course of study.”

It is no accident that Hirsch’s theory of action also squares with a great deal of national and international research suggesting that schools with greater curricular and instructional coherence achieve greater improvement in student performance (herehere and here).

So what might this look like in practice? In a 2003 Educational Researcher article, Lisa Delpit has given a rationale for why schools need to provide all students with access to “the culture of power”:

In my work in dozens of successful classrooms, effective teachers of low-income students of color take every opportunity to introduce children to complex material. While children are learning to “decode,” teachers read complex information to children above their reading level and engage in discussions about the information and the advanced vocabulary they encounter. Students are involved in activities that use the information and vocabulary in both creative and analytical ways, and teachers help them create metaphors for the new knowledge that connects it to their real lives. Students memorize and dramatize material that involves advanced vocabulary and linguistic forms. Students are engaged in thematic units that are ongoing and repeat important domain knowledge and develop vocabulary through repeated oral use. Students are asked to explain what they have learned to others, thus solidifying new knowledge. Not only do the teachers and schools who are successful with low-income children practice these strategies, but some other researchers (Beck et al., 2002Hirsch, 2003Stahl, 1991Sternberg, 1987, to name but a few) have documented the efficacies of the strategies as well. Successful instruction is constant, rigorous, integrated across disciplines, connected to students’ lived cultures, connected to their intellectual legacies, engaging, and designed for critical thinking and problem solving that is useful beyond the classroom.

Will the new SAT—or, for that matter, the new Common Core State Standards, which David Coleman also had a large hand in crafting—lead us toward this vision of educational opportunity? That is yet to be seen, but I would have much more confidence in the outcome if each state department of education had begun with a focus on teaching to the new standards, rather than just testing them. Where are the rich curriculum resources and professional development opportunities that would allow this vision to take hold? And, failing this, what exactly is it that we propose to measure?

__________

* Paradoxically, although the data confirm the expected class-based differences in the use of test prep courses, it should be noted that “blacks and Hispanics are more likely than whites from comparable backgrounds to utilize test preparation. The black-white gap is especially pronounced in the use of high school courses, private courses and private tutors.” See here for more on this.