What
Does Research Say About Assessment?
R.J.
Dietel, J.L. Herman, and R.A. Knuth
NCREL, Oak Brook, 1991
|
|
| Assessment may
be defined as "any method used to better understand the current knowledge
that a student possesses." This implies that assessment can be as simple
as a teacher's subjective judgment based on a single observation of student
performance, or as complex as a five-hour standardized test. The idea of
current knowledge implies that what a student knows is always changing and
that we can make judgments about student achievement through comparisons
over a period of time. Assessment may affect decisions about grades, advancement,
placement, instructional needs, and curriculum.
Purposes of Assessment
The reasons why we assess
vary considerably across many groups of people within the educational
community.
|
|
Who Needs To Assess? Purposes of Assessment
Policymakers
use assessment to: * Set standards * Focus on goals * Monitor the quality
of education * Reward/sanction various practices * Formulate policies
* Direct resources including personnel and money * Determine effects of
tests
Administrators
and school * Monitor program effectiveness planners use assessment to:
* Identify program strengths and weaknesses * Designate program priorities
* Assess alternatives * Plan and improve programs
Teachers and
administrators * Make grouping decisions use assessment to: * Perform
individual diagnosis and prescription * Monitor student progress * Carry
out curriculum evaluation and refinement * Provide mastery/promotion/grading
and other feedback * Motivate students * Determine grades
Parents and students
use * Gauge student progress assessment to: * Assess student strengths
and weaknesses * Determine school accountability * Make informed educational
and career decisions
|
|
Effects of Traditional
Tests
Billions of dollars
are spent each year on education, yet there is widespread dissatisfaction
with our educational system among educators, parents, policymakers, and
the business community. Efforts to reform and restructure schools have
focused attention on the role of assessment in school improvement. After
years of increases in the quantity of formalized testing and the consequences
of poor test scores, many educators have begun to strongly criticize the
measures used to monitor student performance and evaluate programs. They
claim that traditional measures fail to assess significant learning outcomes
and thereby undermine curriculum, instruction, and policy decisions.
The higher the stakes,
the greater the pressure that is placed on teachers and administrators
to devote more and more time to prepare students to do well on the tests.
As a consequence, narrowly focused tests that emphasize recall have led
to a similar narrowing of the curriculum and emphasis on rote memorization
of facts with little opportunity to practice higher-order thinking skills.
The timed nature of the tests and their format of one right answer has
led teachers to give students practice in responding to artificially short
texts and selecting the best answer rather than inventing their own questions
or answers. When teachers teach to traditional tests by providing daily
skill instruction in formats that closely resemble tests, their instructional
practices are both ineffective and potentially detrimental due to their
reliance on outmoded theories of learning and instruction.
Characteristics
of Good Assessment
Good assessment information
provides accurate estimates of student performance and enables teachers
or other decisionmakers to make appropriate decisions. The concept of
test validity captures these essential characteristics and the extent
that an assessment actually measures what it is intended to measure, and
permits appropriate generalizations about students' skills and abilities.
For example, a ten-item addition/subtraction test might be administered
to a student who answers nine items correctly. If the test is valid, we
can safely generalize that the student will likely do as well on similar
items not included on the test. The results of a good test or assessment,
in short, represent something beyond how students perform on a certain
task or a particular set of items; they represent how a student performs
on the objective which those items were intended to assess.
Measurement experts
agree that test validity is tied to the purposes for which an assessment
is used. Thus, a test might be valid for one purpose but inappropriate
for other purposes. For example, our mathematics test might be appropriate
for assessing students' mastery of addition and subtraction facts but
inappropriate for identifying students who are gifted in mathematics.
Evidence of validity needs to be gathered for each purpose for which an
assessment is used.
A second important
characteristic of good assessment information is its consistency, or reliability.
Will the assessment results for this person or class be similar if they
are gathered at some other time or under different circumstances or if
they are scored by different raters? For example, if you ask someone what
his/her age is on three separate occasions and in three different locations
and the answer is the same each time, then that information is considered
reliable. In the context of performance-based and open-ended assessment,
inter-rater reliability also is essential; it requires that independent
raters give the same scores to a given student response.
Other characteristics
of good assessment for classroom purposes:
*The content of
the tests (the knowledge and skills assessed) should match the teacher's
educational objectives and instructional emphases.
*The test items
should represent the full range of knowledge and skills that are the primary
targets of instruction.
*Expectations for
student performance should be clear.
*The assessment
should be free of extraneous factors which unnecessarily confuse or inadvertently
cue student responses. (For example, unclear directions and contorted
questions may confuse a student and confound his/her ability to demonstrate
the skills which are intended for assessment. A math item that requires
reading skill will inhibit the performance of students who lack adequate
skills for comprehension.)
Researchers at the
National Center for Research on Evaluation, Standards, and Student Testing
(CRESST) are developing an expanded set of validity criteria for performance-based,
large-scale assessments. Assessment researchers Bob Linn, Eva Baker, and
Steve Dunbar have identified eight criteria that performance-based assessments
should meet in order to be considered valid.
|
|
|
| Criteria
for Valid Performance-Based Assessments |
Criteria Ask Yourself
|
| Consequences |
Does
using an assessment lead to intended consequences or does it produce unintended
consequences, such as teaching to the test? For example, minimum competency
testing was intended to improve instruction and the quality of learning
for students; however, its actual effects too often were otherwise (a shallow
drill and kill curriculum for remedial students). |
|
Fairness
|
Does
the assessment enable students from all cultural backgrounds to demonstrate
their skills, or does it unfairly disadvantage some students? |
| Transfer |
Do
the results of the assessment generalize to other generalizability problems
and other situations? Do they adequately represent students' performance
in a given domain? |
| Cognitive
complexity |
Do
the assessments adequately assess higher levels of understanding and complex
thinking? We cannot assume that performance-based assessments will test
a higher level of student understanding because they appear to do so. Such
assumptions require empirical evidence. |
| Content
quality |
Are
the tasks selected to measure a given content area worth the time and effort
of students and raters? |
| Content
coverage |
Do
the assessments enable adequate content coverage? |
| Meaningfulness |
Are
the assessment tasks meaningful to students and do they motivate them to
perform their best? |
|
Cost and efficiency
|
Has
attention been given to the efficiency of the data collection designs and
scoring procedures? (Performance-based assessments are by nature labor-intensive.)
|
|
| Findings
from Cognitive Psychology |
| Analysis
of Traditional Views
Methods of assessment are
determined by our beliefs about learning. According to early theories
of learning, complex higher-order skills had to be acquired bit-by-bit
by breaking learning down into a series of prerequisite skill, a building-blocks-of-knowledge
approach. It was assumed incorrectly that after basic skills had been
learned by rote, they could be assembled into complex understandings and
insight. However, evidence from contemporary cognitive psychology indicates
that all learning requires that the learner think and actively construct
evolving mental models.
From today's cognitive perspective,
meaningful learning is reflective, constructive, and self-regulated. People
are seen not as mere recorders of factual information but as creators
of their own unique knowledge structures. To know something is not just
to have received information but to have interpreted it and related it
to other knowledge one already has. In addition, we now recognize the
importance of knowing not just how to perform, but also when to perform
and how to adapt that performance to new situations. Thus, the presence
or absence of discrete bits of information-which is typically the focus
of traditional multiple-choice tests-is not of primary importance in the
assessment of meaningful learning. Rather, what is important is how and
whether students organize, structure, and use that information in context
to solve complex problems.
|
|
Cognitive Psychology
Contrary to past views of
learning, cognitive psychology suggests that learning is not linear, but
that it proceeds in many directions at once and at an uneven pace. Conceptual
learning is not something to be delayed until a particular age or until
all the basic facts have been mastered. People of all ages and ability
levels constantly use and refine concepts. Furthermore, there is tremendous
variety in the modes and speed with which people acquire knowledge, in
the attention and memory capabilities they can apply to knowledge acquisition
and performance, and in the ways in which they can demonstrate the personal
meaning they have created.
Current evidence about the
nature of learning makes it apparent that instruction which strongly emphasizes
structured drill and practice on discrete, factual knowledge does students
a major disservice. Learning isolated facts and skills is more difficult
without meaningful ways to organize the information and make it easy to
remember. Also, applying those skills later to solve real-world problems
becomes a separate and more difficult task. Because some students have
had such trouble mastering decontextualized "basics," they are
rarely given the opportunity to use and develop higher-order thinking
skills.
Recent studies of the integration
of learning and motivation also have highlighted the importance of affective
and metacognitive skills in learning. For example, recent research suggests
that poor thinkers and problem solvers differ from good ones not so much
in the particular skills they possess as in their failure to use them
in certain tasks. Acquisition of knowledge skills is not sufficient to
make one into a competent thinker or problem solver. People also need
to acquire the disposition to use the skills and strategies, as well as
the knowledge of when and how to apply them. These are appropriate targets
of assessment.
The role of the social context
of learning in shaping higher-order cognitive abilities and dispositions
has also received attention over the past several years. It has been noted
that real-life problems often require people to work together as a group
in problem-solving situations, yet most traditional instruction and assessment
have involved independent rather than small group work. Now, however,
it is postulated that groups facilitate learning in several ways: modeling
effective thinking strategies, scaffolding complicated performances, providing
mutual constructive feedback, and valuing the elements of critical thought.
Group assessments, thus, can be important.
|
|
|
|
Important Trends in Assessment
Since the influence of testing
on curriculum and instruction is now widely acknowledged, educators, policymakers,
and others are turning to alternative assessment methods as a tool for
educational reform. The movement away from traditional, multiple-choice
tests to alternative assessments-variously called authentic assessment
or performance assessment-has included a wide variety of strategies such
as open-ended questions, exhibits, demonstrations, hands-on execution
of experiments, computer simulations, writing in many disciplines, and
portfolios of student work over time. These terms and assessment strategies
have led the quest for more meaningful assessments which better capture
the significant outcomes we want students to achieve and better match
the kinds of tasks which they will need to accomplish in order to assure
their future success.
Trends Stemming from
the Behavioral to Cognitive Shift
|
| Emphasis
of Assessment |
Behavioral
Views |
Cognitive
Views |
| Scope of assessment
Discrete, |
View of learner Passive, |
Active, constructing
and responding to integrated
and cross-disciplinary
knowledge environment |
| Beliefs about
knowing |
Accumulation
of isolated facts and isolated skills |
Application and use of
being skilled |
| Emphasis of
instruction |
Delivering
maximally effective
materials and
assessment |
Attention to metacognition
motivation, self-determination |
| Characteristics
of assessment |
Paper-pencil, objective,
multiple-choice,short answer |
Authentic assessments on
contextualized problems that are relevant and meaningful, emphasize higher-level
thinking, do not have a single correct answer, have public standards known
in advance, and are not speeded |
| Frequency of
assessment |
Single occasion |
Samples over time (portfolios)
which provide basis for assessment by teacher, students, and parents |
| Who is assessed |
Individual assessment |
Assessment of group process
skills on collaborative tasks which focus on distributions over averages |
| Use of technology
for |
Machine-scored administration
and scoring sheets |
High-tech applications
such as computer-adaptive testing, expert systems, and simulated environments
|
| What is assessed |
Single attribute oflearner |
Multidimensional assessment
that recognizes the variety of human abilities and talents, malleability
of student ability, and that IQ is not fixed |
| |
|
|
|
There is continuing
faith in the value of assessment for stimulating and supporting school improvement
and instructional reform at national, state, and local levels. In summary,
while the terms may be diverse, several common threads link alternative
assessments:
- Students are involved in
setting goals and criteria for assessment.
- Students perform, create,
produce, or do something.
- Tasks require students
to use higher-level thinking and/or problem solving skills.
- Tasks often provide measures
of metacognitive skills and attitudes, collaborative skills and intrapersonal
skills as well as the more usual intellectual products.
- Assessment tasks measure
meaningful instructional activities.
- Tasks often are contextualized
in real-world applications.
- Student responses are scored
according to specified criteria, known in advance, which define standards
for good performance.
|
| Issues
of Equity and Assessment |
|
While assessment
has the potential to improve learning for all students, historically it
has acted as a barrier rather than a bridge to educational opportunity.
Assessments have been used to label students and put them in dead end
tracks. Traditional tests have been soundly criticized as biased and unfair
to minority students. And, the assessment of language minority students
has been particularly problematic.
A key point regarding
equity as applied to performance-based assessment is made by Yale Professor
Emeritus Edmund Gordon. "We begin with the conviction that it is
desirable that attention be given to questions of equity early in the
development of an assessment process rather than as an add-on near the
end of such work....The task then is to find assessment probes (test items)
which measure the same criterion from contexts and perspectives which
reflect the life space and values of the learner."
Robert Linn says,
"The criterion of equity needs to be applied to any assessment. It
is a mistake to assume that shifting from standardized tests to performance-based
assessments will eliminate concerns about biases against racial/ethnic
minorities or that such a shift will necessarily lead to equality of performance.
"Although many
at-risk students come to school deficient in prior knowledge that is important
to school achievement, teachers and schools can make a substantial difference
through the construction of assessments that take into account the vast
diversity of today's student populations. Gaps in performance among groups
exist because of differences in familiarity, exposure, and motivation
of the subjects being assessed. Substantial changes in instructional strategy
and resource allocation are required to give students adequate preparation
for complex, time-consuming, open-ended assessments. Providing training
and support for teachers to move in these directions is essential.
"Questions of fairness
arise not only in the selection of performance tasks but in the scoring
of responses. As Stiggins has stated, it is critical that the scoring
procedures are designed to assure that `performance ratings reflect the
examinee's true capabilities and are not a function of the perceptions
and biases of the persons evaluating the performance.' The same could
be said regarding the perceptions and biases of the persons creating the
test. The training and calibrating of raters is critical in this regard."
|
|
|
|
Social Organization and Assessment
What we know about performance-based
assessment is limited and there are many issues yet to be resolved. We
do know that approaches which encourage new assessment methods need the
broad-based support of the community and school administration. Like any
change in schools, changes in assessment practices will require:
- Strong leadership support
- Staff development and
training
- Teacher ownership
- Continuing follow-up and
support for change through coaching and mentoring
- Environments that support
experimentation and risk-taking
As schools move toward more
performance-based assessment, they also will need to come to some resolution
on a number of issues, among them that performance-based assessments:
- Require more time to develop
- Cost more
- May limit content coverage
- Require a shift in teaching
practices
- Require substantial time
for administration
- Lack a network of colleagues
for sharing and developing
- Require new methods of
aggregating and reporting data
- Require new viewpoints
about how to use for comparative purposes
|