What Does Research Say About Assessment?

R.J. Dietel, J.L. Herman, and R.A. Knuth
NCREL, Oak Brook, 1991


Assessment may be defined as "any method used to better understand the current knowledge that a student possesses." This implies that assessment can be as simple as a teacher's subjective judgment based on a single observation of student performance, or as complex as a five-hour standardized test. The idea of current knowledge implies that what a student knows is always changing and that we can make judgments about student achievement through comparisons over a period of time. Assessment may affect decisions about grades, advancement, placement, instructional needs, and curriculum.

Purposes of Assessment

The reasons why we assess vary considerably across many groups of people within the educational community.


Who Needs To Assess?               Purposes of Assessment

Policymakers use assessment to: * Set standards * Focus on goals * Monitor the quality of education * Reward/sanction various practices * Formulate policies * Direct resources including personnel and money * Determine effects of tests

Administrators and school * Monitor program effectiveness planners use assessment to: * Identify program strengths and weaknesses * Designate program priorities * Assess alternatives * Plan and improve programs

Teachers and administrators * Make grouping decisions use assessment to: * Perform individual diagnosis and prescription * Monitor student progress * Carry out curriculum evaluation and refinement * Provide mastery/promotion/grading and other feedback * Motivate students * Determine grades

Parents and students use * Gauge student progress assessment to: * Assess student strengths and weaknesses * Determine school accountability * Make informed educational and career decisions


top

Effects of Traditional Tests

Billions of dollars are spent each year on education, yet there is widespread dissatisfaction with our educational system among educators, parents, policymakers, and the business community. Efforts to reform and restructure schools have focused attention on the role of assessment in school improvement. After years of increases in the quantity of formalized testing and the consequences of poor test scores, many educators have begun to strongly criticize the measures used to monitor student performance and evaluate programs. They claim that traditional measures fail to assess significant learning outcomes and thereby undermine curriculum, instruction, and policy decisions.

The higher the stakes, the greater the pressure that is placed on teachers and administrators to devote more and more time to prepare students to do well on the tests. As a consequence, narrowly focused tests that emphasize recall have led to a similar narrowing of the curriculum and emphasis on rote memorization of facts with little opportunity to practice higher-order thinking skills. The timed nature of the tests and their format of one right answer has led teachers to give students practice in responding to artificially short texts and selecting the best answer rather than inventing their own questions or answers. When teachers teach to traditional tests by providing daily skill instruction in formats that closely resemble tests, their instructional practices are both ineffective and potentially detrimental due to their reliance on outmoded theories of learning and instruction.

Characteristics of Good Assessment

Good assessment information provides accurate estimates of student performance and enables teachers or other decisionmakers to make appropriate decisions. The concept of test validity captures these essential characteristics and the extent that an assessment actually measures what it is intended to measure, and permits appropriate generalizations about students' skills and abilities. For example, a ten-item addition/subtraction test might be administered to a student who answers nine items correctly. If the test is valid, we can safely generalize that the student will likely do as well on similar items not included on the test. The results of a good test or assessment, in short, represent something beyond how students perform on a certain task or a particular set of items; they represent how a student performs on the objective which those items were intended to assess.

Measurement experts agree that test validity is tied to the purposes for which an assessment is used. Thus, a test might be valid for one purpose but inappropriate for other purposes. For example, our mathematics test might be appropriate for assessing students' mastery of addition and subtraction facts but inappropriate for identifying students who are gifted in mathematics. Evidence of validity needs to be gathered for each purpose for which an assessment is used.

A second important characteristic of good assessment information is its consistency, or reliability. Will the assessment results for this person or class be similar if they are gathered at some other time or under different circumstances or if they are scored by different raters? For example, if you ask someone what his/her age is on three separate occasions and in three different locations and the answer is the same each time, then that information is considered reliable. In the context of performance-based and open-ended assessment, inter-rater reliability also is essential; it requires that independent raters give the same scores to a given student response.

Other characteristics of good assessment for classroom purposes:

*The content of the tests (the knowledge and skills assessed) should match the teacher's educational objectives and instructional emphases.

*The test items should represent the full range of knowledge and skills that are the primary targets of instruction.

*Expectations for student performance should be clear.

*The assessment should be free of extraneous factors which unnecessarily confuse or inadvertently cue student responses. (For example, unclear directions and contorted questions may confuse a student and confound his/her ability to demonstrate the skills which are intended for assessment. A math item that requires reading skill will inhibit the performance of students who lack adequate skills for comprehension.)

Researchers at the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) are developing an expanded set of validity criteria for performance-based, large-scale assessments. Assessment researchers Bob Linn, Eva Baker, and Steve Dunbar have identified eight criteria that performance-based assessments should meet in order to be considered valid.


top
Criteria for Valid Performance-Based Assessments

Criteria                  Ask Yourself
Consequences Does using an assessment lead to intended consequences or does it produce unintended consequences, such as teaching to the test? For example, minimum competency testing was intended to improve instruction and the quality of learning for students; however, its actual effects too often were otherwise (a shallow drill and kill curriculum for remedial students).

Fairness

 

Does the assessment enable students from all cultural backgrounds to demonstrate their skills, or does it unfairly disadvantage some students?
Transfer Do the results of the assessment generalize to other generalizability problems and other situations? Do they adequately represent students' performance in a given domain?
Cognitive complexity Do the assessments adequately assess higher levels of understanding and complex thinking? We cannot assume that performance-based assessments will test a higher level of student understanding because they appear to do so. Such assumptions require empirical evidence.
Content quality Are the tasks selected to measure a given content area worth the time and effort of students and raters?
Content coverage Do the assessments enable adequate content coverage?
Meaningfulness Are the assessment tasks meaningful to students and do they motivate them to perform their best?

Cost and efficiency

Has attention been given to the efficiency of the data collection designs and scoring procedures? (Performance-based assessments are by nature labor-intensive.)

top
Findings from Cognitive Psychology
Analysis of Traditional Views

Methods of assessment are determined by our beliefs about learning. According to early theories of learning, complex higher-order skills had to be acquired bit-by-bit by breaking learning down into a series of prerequisite skill, a building-blocks-of-knowledge approach. It was assumed incorrectly that after basic skills had been learned by rote, they could be assembled into complex understandings and insight. However, evidence from contemporary cognitive psychology indicates that all learning requires that the learner think and actively construct evolving mental models.

From today's cognitive perspective, meaningful learning is reflective, constructive, and self-regulated. People are seen not as mere recorders of factual information but as creators of their own unique knowledge structures. To know something is not just to have received information but to have interpreted it and related it to other knowledge one already has. In addition, we now recognize the importance of knowing not just how to perform, but also when to perform and how to adapt that performance to new situations. Thus, the presence or absence of discrete bits of information-which is typically the focus of traditional multiple-choice tests-is not of primary importance in the assessment of meaningful learning. Rather, what is important is how and whether students organize, structure, and use that information in context to solve complex problems.

Cognitive Psychology

Contrary to past views of learning, cognitive psychology suggests that learning is not linear, but that it proceeds in many directions at once and at an uneven pace. Conceptual learning is not something to be delayed until a particular age or until all the basic facts have been mastered. People of all ages and ability levels constantly use and refine concepts. Furthermore, there is tremendous variety in the modes and speed with which people acquire knowledge, in the attention and memory capabilities they can apply to knowledge acquisition and performance, and in the ways in which they can demonstrate the personal meaning they have created.

Current evidence about the nature of learning makes it apparent that instruction which strongly emphasizes structured drill and practice on discrete, factual knowledge does students a major disservice. Learning isolated facts and skills is more difficult without meaningful ways to organize the information and make it easy to remember. Also, applying those skills later to solve real-world problems becomes a separate and more difficult task. Because some students have had such trouble mastering decontextualized "basics," they are rarely given the opportunity to use and develop higher-order thinking skills.

Recent studies of the integration of learning and motivation also have highlighted the importance of affective and metacognitive skills in learning. For example, recent research suggests that poor thinkers and problem solvers differ from good ones not so much in the particular skills they possess as in their failure to use them in certain tasks. Acquisition of knowledge skills is not sufficient to make one into a competent thinker or problem solver. People also need to acquire the disposition to use the skills and strategies, as well as the knowledge of when and how to apply them. These are appropriate targets of assessment.

The role of the social context of learning in shaping higher-order cognitive abilities and dispositions has also received attention over the past several years. It has been noted that real-life problems often require people to work together as a group in problem-solving situations, yet most traditional instruction and assessment have involved independent rather than small group work. Now, however, it is postulated that groups facilitate learning in several ways: modeling effective thinking strategies, scaffolding complicated performances, providing mutual constructive feedback, and valuing the elements of critical thought. Group assessments, thus, can be important.


top

Important Trends in Assessment

Since the influence of testing on curriculum and instruction is now widely acknowledged, educators, policymakers, and others are turning to alternative assessment methods as a tool for educational reform. The movement away from traditional, multiple-choice tests to alternative assessments-variously called authentic assessment or performance assessment-has included a wide variety of strategies such as open-ended questions, exhibits, demonstrations, hands-on execution of experiments, computer simulations, writing in many disciplines, and portfolios of student work over time. These terms and assessment strategies have led the quest for more meaningful assessments which better capture the significant outcomes we want students to achieve and better match the kinds of tasks which they will need to accomplish in order to assure their future success.

Trends Stemming from the Behavioral to Cognitive Shift

Emphasis of Assessment Behavioral Views Cognitive Views
Scope of assessment Discrete, View of learner Passive, Active, constructing and responding to integrated and cross-disciplinary knowledge environment
Beliefs about knowing Accumulation of isolated facts and isolated skills Application and use of being skilled
Emphasis of instruction Delivering maximally effective materials and assessment Attention to metacognition motivation, self-determination
Characteristics of assessment Paper-pencil, objective, multiple-choice,short answer Authentic assessments on contextualized problems that are relevant and meaningful, emphasize higher-level thinking, do not have a single correct answer, have public standards known in advance, and are not speeded
Frequency of assessment Single occasion Samples over time (portfolios) which provide basis for assessment by teacher, students, and parents
Who is assessed Individual assessment Assessment of group process skills on collaborative tasks which focus on distributions over averages
Use of technology for Machine-scored administration and scoring sheets High-tech applications such as computer-adaptive testing, expert systems, and simulated environments
What is assessed Single attribute oflearner Multidimensional assessment that recognizes the variety of human abilities and talents, malleability of student ability, and that IQ is not fixed
 
top
 

There is continuing faith in the value of assessment for stimulating and supporting school improvement and instructional reform at national, state, and local levels. In summary, while the terms may be diverse, several common threads link alternative assessments:

  • Students are involved in setting goals and criteria for assessment.
  • Students perform, create, produce, or do something.
  • Tasks require students to use higher-level thinking and/or problem solving skills.
  • Tasks often provide measures of metacognitive skills and attitudes, collaborative skills and intrapersonal skills as well as the more usual intellectual products.
  • Assessment tasks measure meaningful instructional activities.
  • Tasks often are contextualized in real-world applications.
  • Student responses are scored according to specified criteria, known in advance, which define standards for good performance.
Issues of Equity and Assessment

While assessment has the potential to improve learning for all students, historically it has acted as a barrier rather than a bridge to educational opportunity. Assessments have been used to label students and put them in dead end tracks. Traditional tests have been soundly criticized as biased and unfair to minority students. And, the assessment of language minority students has been particularly problematic.

A key point regarding equity as applied to performance-based assessment is made by Yale Professor Emeritus Edmund Gordon. "We begin with the conviction that it is desirable that attention be given to questions of equity early in the development of an assessment process rather than as an add-on near the end of such work....The task then is to find assessment probes (test items) which measure the same criterion from contexts and perspectives which reflect the life space and values of the learner."

Robert Linn says, "The criterion of equity needs to be applied to any assessment. It is a mistake to assume that shifting from standardized tests to performance-based assessments will eliminate concerns about biases against racial/ethnic minorities or that such a shift will necessarily lead to equality of performance.

"Although many at-risk students come to school deficient in prior knowledge that is important to school achievement, teachers and schools can make a substantial difference through the construction of assessments that take into account the vast diversity of today's student populations. Gaps in performance among groups exist because of differences in familiarity, exposure, and motivation of the subjects being assessed. Substantial changes in instructional strategy and resource allocation are required to give students adequate preparation for complex, time-consuming, open-ended assessments. Providing training and support for teachers to move in these directions is essential.

"Questions of fairness arise not only in the selection of performance tasks but in the scoring of responses. As Stiggins has stated, it is critical that the scoring procedures are designed to assure that `performance ratings reflect the examinee's true capabilities and are not a function of the perceptions and biases of the persons evaluating the performance.' The same could be said regarding the perceptions and biases of the persons creating the test. The training and calibrating of raters is critical in this regard."


top

Social Organization and Assessment

What we know about performance-based assessment is limited and there are many issues yet to be resolved. We do know that approaches which encourage new assessment methods need the broad-based support of the community and school administration. Like any change in schools, changes in assessment practices will require:

  • Strong leadership support
  • Staff development and training
  • Teacher ownership
  • Continuing follow-up and support for change through coaching and mentoring
  • Environments that support experimentation and risk-taking

As schools move toward more performance-based assessment, they also will need to come to some resolution on a number of issues, among them that performance-based assessments:

  • Require more time to develop
  • Cost more
  • May limit content coverage
  • Require a shift in teaching practices
  • Require substantial time for administration
  • Lack a network of colleagues for sharing and developing
  • Require new methods of aggregating and reporting data
  • Require new viewpoints about how to use for comparative purposes