Report on Oral Testing and Criterion Referencing

Andrew Finch M.A., M.Ed., Deputy Director
Kevin Sampson, BA, Dip. Ed., Programme Coordinator
Mark Miller, M.A., Programme Coordinator
Seoul National University of Technology Language Center
April 2000

1. Introduction
3. Task-based testing
5. Authentic assessment
7. The Korean situation
9. Conclusion

2. Language Testing: History of Research
4. Validity/reliability
6. Criterion-referenced/norm-referenced testing
8. Self-Assessment
... References

1. Introduction

This report details current research and theory regarding the testing of second-language performance, with special reference to the Republic of Korea. In line with modern educators, it concludes that the bell-curve of norm-referencing is no longer applicable to language testing, and that criterion-referencing is appropriate for "performance"situations, in order to be able to give the appropriate grade to the appropriate student.

Self-assessment has also been found to be a reliable and valid means of assessing students and at the same time encouraging them to take on life-long learning skills that will enable them to become responsible learners and citizens. This report therefore suggests that self-assessment in a criterion-referenced setting is a reliable and effective means of grading in the long and short term, and for internal and external evaluation results.

2. Language Testing: History of Research

 A systematic testing component is an essential part of every language program and of most language classrooms (despite the fact that many teachers feel intimidated by the terminology and use of statistical concepts, cf. Brown 1995:12), being used to measure language aptitude, proficiency, placement, diagnosis, progress, and achievement, and providing feedback for the program evaluator(s), washback information for teachers and students, and motivational washforward implications for all concerned. However, the area of language testing in general and performance testing in particular, is fraught with problems of theory and practice, such that it will be appropriate here to give a brief overview of current language testing situation before discussing how and why an oral testing component was built in to the present language programme at SNUT. For further information and a more detailed analysis of factors concerned in language testing, the reader is referred to Skehan¡¯s ¡°state of the art article¡± of 1988, and Lee¡¯s work on Task-based oral testing (1991).

Skehan (1998:153) defines a test as ¡°a systematic method of eliciting performance which is intended to be the basis for some sort of decision making¡±, and draws attention to the tendency of testers to place an emphasis on ¡°care and standardization in assessment in the belief that such methods of examining performance will have more to contribute to reliable measurement than informal assessment by people who may be very familiar with particular language users¡± (1998:153). Such an attitude can be seen as following from the assumption that ¡°there are knowable best ways of learning and that these can be discovered using a scientific method which has long been discarded by contemporary philosophers (Popper), scientists (Medawar) and physicists (Heisenberg)"(Harri-Augstein & Thomas 1991:7), and has been at the heart of language testing from its ¡°pre-scientific¡± stage (Spolsky 1975), to its psychometric-structuralist ¡°scientific stage¡± (when discrete-point testing represented the accepted behaviourist ¡®truth¡¯). Such an emphasis reflects the view that language can be learned by studying its parts in isolation, that acquisition of these parts can be tested and will successfully predict performance levels, and that the learner can be relied on to reconstruct the parts in meaningful situations when necessary. This view did not disappear in the ¡°psycholinguistic-sociolinguistic stage of the 1970s, when integrative testing (e.g. cloze tests and dictation) claimed to come from a sounder theoretical base (Oller 1979) but was shown by commentators such as Alderson (1981), Morrow (1979) and Carroll (1981:9) to be still concerned with usage rather than use (Widdowson 1983), therefore being only indirect tests of potential efficiency. Kelly (1978:245-6) also pointed out that it is possible to develop proficiency in the integrative test itself, and that indirect tests cannot diagnose specific areas of difficulty in relation to the authentic task. Such tests can only supply information on a candidate's linguistic competence, and have nothing to offer in terms of performance ability (Weir 1988).

A consensus that "knowledge of the elements of a language in fact counts for nothing unless the user is able to combine them in new and appropriate ways to meet the linguistic demands of the situation in which he wishes to use the language"(Morrow 1979:145), and an acknowledgement that the easily quantifiable, reliable, and efficient data obtained from discrete (and cloze) testing implies that proficiency is neatly quantifiable in such a fashion (cf. Oller 1979a:212), led to a perception that it would be better to test the ability to perform in a specified socio-linguistic setting, and based on work by Hymes (1972), Canale & Swain (1980), and Morrow (1979), the emphasis shifted from linguistic accuracy to the ability to function effectively through language in particular contexts of situation (a demonstration of competence and of the ability to use this competence), and communicative testing was adopted as a means of assessing language acquisition (though with some lack of initial agreement or direction, cf. McClean 1995:137; Benson, 1991). For Canale & Swain (1980), testing communicative language ability included grammatical competence, (knowledge of the rules of grammar) sociolinguistic competence (knowledge of the rules of use and of discourse), and strategic competence (knowledge of verbal and non-verbal communication strategies). Canale (1983) later updated this to a four-dimensional model - linguistic, sociolinguistic, discoursal (cohesion and coherence) and strategic competencies, while Bachman (1990) saw it as consisting of language competence, strategic competence, and psychophysiological mechanisms. However, the relationship between the various competencies, the way they are integrated into overall communicative competence, and the way this is translated into communicative performance are all in need of clarification, and such models are themselves in need of validation (Swain 1985; Skehan 1988). Skehan (1988) articulated the dilemma of communicative language testing at the end of the 1980s:

What we need is a theory which guides and predicts how an underlying communicative competence is manifested in actual performance; how situations are related to one another, how competence can be assessed by examples of performance on actual tests; what components communicative competence actually has; and how these interrelate ¡¦ Since such definitive theories do not exist, testers have to do the best they can with such theories as are available. (cited in Weir 1988:7)

Thus Canale & Swain¡¯s (1980) framework, though an insightful start, is seen by Skehan (1998:159) as neither practical nor comprehensive (cf. Cziko 1984), possessing no systematic basis, and unable to advance prediction and generalisation in any substantial way, a problem that was addressed in later developments (Bachman 1990; Bachman & Palmer 1996) by application of categories to real contexts. The Bachman (1990) model still lacks a ¡°rationale founded in psycholinguistic mechanisms and processes (and research findings) which can enable [it to] ¡¦ make functional statements about the nature of performance and the way it is grounded in competence¡± (Skehan 1998:164), but it is: i) more detailed in its specification of component language competences; ii) more precise in the interrelationships between the different component competences; iii) more grounded in contemporary linguistic theory; and iv) more empirically based, allowing a more effective mapping of components of competence on to language use situations, and more principled comparisons of those components.

go to top

3. Task-based testing

Bachman¡¯s model (1990) uses familiar empirical research methods in which data is perceived in terms of the underlying structural model, to infer abilities, via a static picture of proficiency, based on the assumption that there are competence-oriented underlying abilities made up of different interacting components (cf. Canale & Swain 1980). However, cognitive theory tells us that second language performers, faced with a developing inter-language, and performance pressures such as fluency, accuracy and complexity, do not draw upon an ¡°a generalized and stable underlying competence¡±, (Skehan 1998:169) but allocate limited processing attention in appropriate ways, drawing on parallel coding systems for efficiency of real-time communication. Given this redefined view of the competence-performance relationship, Skehan proposes a construct of "ability for use", which would allow a processing competence to operate and to be assessed, and advocates the use of tasks as a central unit within a testing context, with the proviso that we need to know ¡°more about the way tasks themselves influence (and constrain) performance¡± (1998:169). McNamara (1995,1996) (following Kenyon 1992) provides a model of such multiple influences on test performance, in which the learner handles the fluctuating communicative demands through a processing competence. In the expanded version of the model, assessment of competences and ability for use involves generalised processing capacities and meaningful language use, with tasks (including task qualities, types, and characteristics) being central to predictions of performance and generalisations across contexts (cf. Weir 1998).

In contrast to methods of evaluating performance which use ¡°reliable¡± analytic scales (in areas such as grammar, vocabulary, fluency, appropriateness, and pronunciation) but which do not allow for affect and for competing demands on attention, a processing approach in a task-based framework allows generalisations to be made on the three basic language-sampling issues of inferring abilities, predicting performance and generalising across contexts, using the criteria of fluency, breadth/complexity of language used, and accuracy (Skehan 1998:177). These criteria will themselves compete for processing resources in the performer, and the score may be influenced by whichever processing goals were emphasised by him/her. Tasks also need to be rated (e.g. in terms of planning, time pressure, modality, stakes, opportunity for control, ¡®manufactured¡¯ surprise, and degree of support), since they will also affect the outcome. Skehan sees the conditions under which tasks are performed and the way conditions interact with performance as ¡°a fertile area for research¡± (1998:177).

go to top

4. Validity/reliability

Language testing has traditionally been limited by considerations of validity (whether tests actually measure what they are supposed to measure [Thrasher 1984]), reliability (whether they produce similar results on more than one occasion), and efficiency (logistics of test administration) (Weir 1988:1). Validity is seen by Spolsky (1975) and Messick (1988) as the major problem in foreign language testing, including content validity (the test is a representative sample of the language skills and structures it is meant to test), criterion-related validity, construct-validity (the extent to which the test matches a theoretical construct) (Bachman 1990), face-validity (the test looks reasonable to the test-taker), predictive validity (the predictive force of the test), concurrent-validity (the test and the criterion are administered at the same time) (Davies 1990), and educational validity (the relationship between positive test effects and students¡¯ study habits) (Thrasher 1984). Nakamura (1995) argues that predictive validity, educational validity, construct validity, concurrent validity, face validity and content validity should be analysed in tests of speaking ability, and Kohonen (1999:291) also stresses validity in communicative evaluation.

Williams & Burden (1997) however, argue that the energy spent by test constructors on strengthening the reliability and validity of their tests so that they can be standardised, is largely misspent, since this assumes that the test is measuring a relatively fixed characteristic, rather than a hypothetical construct (the researcher's best attempt to define what is involved). In fact, individual- and affect-related traits are variable, and often context specific, such that ¡°a test should be expected to produce different results on different occasions"(1997:90). As Kelly mentions (1955:77), when a subject fails to meet the experimenter's expectations all that can be said is that he/she has not conformed to those expectations or to the experimenter¡¯s definition of learning. In recognition of this problem, researchers have employed the concept of construct validity to indicate how well the test relates to the construct under investigation, but this still does not mean that it actually exists: "The point is that it is extremely difficult to construct a test which is truly valid in that it really measures what it is supposed to measure"(Kelly 1955:77). Weir (1988:7) also points out that the validity of ¡°communicative¡± tests is dependant on the test-constructor¡¯s understanding and definition of the term, and Van Lier (1996) goes deeper still into "accountability¡±: tests must measure, and can only measure that which is measurable:

It is quite possible that the deepest, most satisfying aspects of achievement, and the most profound effects of education, both in positive and negative terms, are entirely unmeasurable ¡¦ What if we held educators accountable for the quality of the memories they gave to their students, rather than for averages on national tests? (Van Lier 1996:120)

go to top

5. Authentic assessment

Language testing has thus evolved in a short time from a ¡°physical science¡± approach (in which language learners are impersonal ¡®data¡¯) to a ¡°personal science¡± (in which people explain themselves to themselves), and more recently, to a ¡°conversational science¡± approach, based on the premise that the unique attribute of humans is that they converse. Psychologists and educators still know little about how language learning occurs, and why and how some individuals are more competent than others, so that it is inappropriate to define and test discrete symptoms of the process. However, observable factors that appear to be associated with learning include construction of meaning, sharing of experiences, identification of needs and purposes, critical evaluation of performance strategies, and awareness of this process (Harri-Augstein & Thomas 1991:7). Thus Kohonen (1999) extends Skehan's task-based framework, and proposes "Authentic assessment¡± as a process-oriented means of evaluating communicative competence, cognitive abilities and affective learning (Kohonen 1999:284; cf. Hart 1994:9; O'Malley & Pierce 1996:x-6), using reflective forms of assessment in instructionally-relevant classroom activities (communicative performance assessment, language portfolios and self-assessment), and focusing on curriculum goals, enhancement of individual competence; and integration of instruction and assessment. In this two-way process, "the essentially interactive nature of learning is extended to the process of assessment"(Williams & Burden, 1997:42), examining what learners can do with their language, through real-life language use tasks (cf. Weir 1988:9). For the learner this means developing reflective awareness through self-assessment and peer assessment, learning "how to manage ¡¦ learning, rather than just managing to learn"(Williams & Burden, 1997:291). For the teacher (whose professional judgement and commitment to enhancing student learning is an important part of this process), authentic assessment means collecting information about learner progress and the social learning environment in the class, which can be labour-intensive and demanding, requiring clear guidelines and continuing supervision of the learners so that they can become skilful in self/peer assessment. For both, there are implications in terms of roles, the teacher becoming a:

¡¦ tool-maker and provider, observer and joint interpreter of the evolving conversational experiment in which both subject and [teacher] are full but different participants ¡¦ Only the subject/learner can tap his or her personal experience, but the experimenter can observe behaviour and recruit methodological skills to drive the experiment forward. (Harri-Augstein & Thomas 1991:6)

Kohonen (1999) gives a list of 13 ways in which authentic assessment can enhance learning, and summarises how this approach contrasts with standardised testing (table I below):

TABLE I: COMPARISON OF STANDARDISED TESTING AND AUTHENTIC ASSESSMENT (KOHONEN 1999:285).

 

Standardised testing

Authentic testing

1

Testing and instruction are regarded as separate activities

Assessment is an integral part of instruction

2

Students are treated in a uniform way

Each learner is treated as a unique person

3

Decisions are based on single sets of data (test scores)

Provides multiple sources of data, a more informative view

4

Emphasis on weakness/failures: what students cannot do

Emphasis on strengths/progress: what learners can do

5

One-shot exams

Ongoing assessment

6

Cultural/socio-economic status bias

More culture-fair

7

Focus on one 'right answer'

Possibility of several perspectives

8

Judgement without suggestions for improvement

Useful information for improving/guiding learning

9

Pressures teachers to narrow teaching to what is tested)

Allows teachers to develop meaningful curricula

10

Focus on lower-order knowledge and skills

Emphasis on higher-order learning outcomes and thinking skills

11

Forbids students to interact; promotes comparisons between students (norm-referencing)

Encourages collaborative learning; compares learners to their own past performances and the aims

12

Intrinsic learning for a grade

Intrinsic learning for its own sake.

6. Criterion-referenced/norm-referenced testing

Authentic assessment in a task-based process setting implies a focus on language mastery (criterion-referenced performance) rather than relative performance (norm-referenced performance), a focus which Ames and Archer (1988) found to be highly motivating in the classroom, fostering long-term use of learning strategies and helping students form realistic but challenging goals. When relative performance was the goal however, learners believed that ability was shown by success with little effort, and they judged their ability lower. As Darling-Hammond (1994:110) points out, assessment needs to support authentic forms of teaching and learning (cf. North 1992; Kohonen 1996), and task-based process assessment involves a criterion-referenced orientation, with Criterion-referenced tests (CRTs) providing direct information "about what the learner can actually do with the target language." (McClean 1995:137). Strengths and weaknesses can be isolated across the whole test population, and specific information can be gained about an individual's performance, in contrast to Norm-related Tests (NRTs), which tend to give information only about learners at either ends of the scale (cf. McClean 1995:146; Cartier 1968; Cziko 1982; 1983; Hudson & Lynch 1984; Brown 1988; 1989a; Bachman 1989; 1990).

Brown (1995) points out that program evaluations tend to use tests that are sensitive to the goals and objectives of the program, and that such program-sensitive tests (sometimes called program-fair tests) "are by definition criterion-referenced, not norm-referenced¡± (Brown 1995:18), being different from NRTs either in test characteristics or logistics (table II below):

TABLE II: DIFFERENCES BETWEEN NRTS AND CRTS (BROWN 1995:12).

CRTs

NRTs

Test Characteristics

Underlying Purposes

Foster learning

Classify/group students

Types of Decisions

Diagnosis, progress, achievement

Aptitude, proficiency, placement

Levels of Generality

Classroom specific

Overall, global

Students¡¯ Expectations

Know content to expect

Do not know content

Score interpretations

Percent

Percentile

Score report strategies

Tests and answers to

Only scores go to students

Logistical Dimensions

Group Size

Relatively small group

Large group

Range of Abilities

Relatively homogeneous

Wide range of abilities

Test Length

Relatively few questions

Large number of questions

Time Allocated

Relatively short time

Long (2-4 hours) administration

Cost

Teacher time & duplication

Test booklets, tapes, proctor.

go to top

7. The Korean situation

The situation relating to oral testing in Korea is similar to the one described by McClean (1995), when talking about Japan (the Korean education system was established by the Japanese when they colonised the peninsula in the first half of the 20th Century). Thus "what little oral assessment has been done in Japanese [Korean] universities so far has been isolated, haphazard, lacking in form, and subjective. Testers are confused about what specifically to focus on when assessing test-takers'oral competence." (McClean:137). Before coming to University, students'oral abilities are measured through reading or writing tests, which encourage students to handle indirect rather than realistic tasks (Hughs 1989; Brindley, 1989; Lee 1991; Weir, 1998). This is an understandable situation in High Schools which must train students to pass the mostly multiple-choice discrete-item University entrance tests, and this test system must change, in order to produce more emphasis on performance in the classroom, in line with the national call for spoken English proficiency, which makes norm-referenced testing inappropriate. Meanwhile, teachers at all levels are faced with the dilemma of preparing students for important national NRTs while being asked to incorporate a (non-tested) communicative element to their teaching. Such a situation is reminiscent of Rea's comment on testing in the late 1970s:

Although we would agree that language is a complex behaviour and that we would generally accept a definition of overall language proficiency as the ability to function in a natural language situation, we still insist on, or let others impose on us, testing measures which assess language as an abstract array of discrete items, to be manipulated only in a mechanistic way. Such tests yield artificial, sterile and irrelevant types of items which have no relationship to the use of language in real life situations. (Rea 1978:51, cited in Weir 1998:3).

go to top

8. Self-Assessment

Based on work carried out since the late 1970s, various authors and researchers agree on self-assessment as a vital part of learner autonomy (Henner-Stanchina & Holec 1985:98; Dickinson 1987:16; Blanche 1988:75; Harris 1997:12), providing the opportunity for learners to assess their own progress and thus helping them to focus on their own learning. Hunt, Gow & Barnes even claim that without learner self-evaluation and self-assessment "¡¦ there can be no real autonomy"(1989:207). Rea (1981) sees self-appraisal as helping the learner become aware of his/her responsibilities in planning, executing and monitoring his/her language learning activities, and Oscarson agrees on such a formative prime aim, adding a more summative secondary aim of enabling the learner "to assess his total achievement at the end of a course or course unit"(1978). Dickinson points out that this does not necessarily devalue or conflict with external evaluation, which still has relevance for supplying official certification of learning (1987:136). Rather, as Dickinson & Carver observe:

A language course can only deal with a small fraction of the foreign language; therefore one objective of language courses should be to teach learners how to carry on learning the language independently. Part of the training learners need for this purpose is training in self-assessment and self-monitoring. (1980:7)

The favourable correlation of self-rating scores and external test scores in findings mostly support the use of self-assessment in second language learning and Oscarson's ¡°rationale of self-assessment procedures in language learning¡± (1989:3) serves as a framework for the various justifications for self-assessment that have been proposed:

  1.       promotion of learning: "Self-rating requires the student to exercise a variety of learning strategies and higher order thinking skills that not only provide feedback to the student but also provide direction for future learning"(Chamot & O'Malley, 1994:119). Assessment leading towards evaluation is an important educational objective in its own right; training learners in this is beneficial to learning (Dickinson 1987:136);
  2.       raised level of awareness: "Students need to know what their abilities are, how much progress they are making and what they can (or cannot yet) do with the skills they have acquired" (Blanche 1988:75);
  3. improved goal orientation: "Engaging the learner actively in the evaluation of learning effects will probably lead to greater interest in techniques for continuous assessment, as opposed to terminal or 'end-of-unit'assessment"(Oscarson,  1978:2);
  4.       expansion of range of assessment: If learners can appraise their own performance accurately enough, "they do not have to depend entirely on the opinion of teachers and at the same time they can make teachers aware of their individual learning needs"(Blanche 1988:75);
  5.       shared assessment burden: Self-assessment is one way of alleviating the assessment burden on the teacher (Dickinson 1987:136). "Combining self-assessment with teacher assessment means that the latter can become more effective" (Harris 1997:17);
  6. beneficial postcourse effects: Self-assessment is a necessary part of self-direction (Dickinson 1987:136).

Much of the self-assessment debate focuses on its feasibility and practicality for self-directed individuals, often in self-access study situations. Harris (1997:13) also sees it as appropriate in test-driven secondary and tertiary education, claiming that self-assessment can help learners in such environments to become more active, to locate their own strengths and weaknesses, and to realise that they have the ultimate responsibility for learning. By encouraging individual reflection, "self-assessment can begin to make students see their learning in personal terms [and] can help learners get better marks."(Harris (1997:13). Peer assessment (a form of self assessment [Tudor 1996:182] and justified largely by the same arguments) is especially applicable to the classroom setting, aiming to encourage students to take increased responsibility for their own curricula and to become active participants in the learning process (Hill 1994:214; Miller & Ng 1996:134). Tudor adds that critical reflection on the abilities of other learners with respect to a shared goal is a practical form of learner training which helps individuals to assess their own performance, and which reduces the stress of error correction through identifying them in others (Tudor 1996.: 182). Thus Assinder (1991:218-28) reports increased motivation, participation, real communication, in-depth understanding, commitment, confidence, meaningful practice and accuracy, when students prepare and deliver learning tasks for each other.

Haughton & Dickinson (1989) (cited in Miller & Ng 1996:135) set out to test nine hypotheses about peer assessment in their study of a collaborative post-writing assessment. Five hypotheses (items 1 to 5, below) dealt with the practicality of peer assessment, and four (6 to 9) with the benefits of the scheme:

  1. Students are sincere and do not use the scheme as a means of obtaining higher grades than they themselves think they deserve;
  2. Students are or become able to assess themselves at about the same level as their tutors, i.e. they can interpret the criteria in the same way;
  3. Students are or become able to negotiate with tutors on the appropriate level of criteria;
  4.       Students are or become able to negotiate grades in a meaningful and satisfying manner;
  5. The scheme does not result in a lowering of standards on the course;
  6. Students perceive collaborative assessment as fairer than other (traditional) forms of assessment;
  7.       Students benefit in enhanced understanding of and attitude towards assessment;
  8. Students become more self-directed as a result;
  9.       The scheme demands more thoroughly worked out criteria of assessment and hence results in fairer assessment.

This study showed "a relatively high level of agreement between the peer assessments and the marks given by the lecturers"(Miller & Ng 1996:139), and similar reliability of results was reported by Bachman & Palmer (1982) with the self-rating of communicative language ability of ESL learners in the USA (aged 17-67). Fok (1981), looking at a group of university students in Hong Kong, also found a high degree of similarity between the students'self-assessment and past academic records for Reading and Speaking. Thus Haughton & Dickinson (1989) claim that to a large extent the scheme worked and that the students were able to assess their own work realistically, even though most students felt inexperienced as testers (lack of reliability) and were not comfortable with being tested by classmates (fear of losing face) (Miller & Ng 1996:141). Despite this, Miller & Ng considered that: i) the students were sincere; ii) they demonstrated a similar level of assessment to that of the lecturers; iii) the scheme did not result in a lowering of standards; and iv) the students benefited in their understanding of and attitude towards assessment by taking part in the study, stating that "language students are able to make a realistic assessment of each others'oral language ability"((Miller & Ng 1996:142).

go to top

9. Conclusion

If students are to learn in a way that motivates and is meaningful to them (given that these factors will enhance and promote language acquisition), this will involve consciousness-raising (language learning awareness), reflection (self-assessment), and development of learning strategies, as part of "actual"language study. Assessment in this context exists to give information to the learner and the teacher in terms of learning strengths and weaknesses, so that future goals can be set and learning plans devised. Testing which concentrates on the "target-like appearance of forms" (Larsen-Freeman 1997:155) ignores the fact that "we have no mechanism for deciding which of the phenomena described or reported to be carried out by the learner are in fact those that lead to language acquisition" (Seliger 1984:37), as well as the fact that the learner's internal grammar is not a steady commodity and often deteriorates prior to internalising new content. Even if we could identify and measure all of the factors in second language acquisition, complexity theory tells us that "we would still be unable to predict the outcome of their combination" (Larsen-Freeman 1997:157).

In the recent shift in educational theory from transmission of knowledge towards transformation of knowledge, and to integration of knowledge with existing personal constructs and meanings (Kohonen 1999:280), evaluation is taking on new affective goals in which the personal growth of the learner is becoming increasingly important (Ranson 1994:116). Thus it is no longer defensible to use discrete-item testing of dubious constructs or to sample performance as a means of inferring underlying competence or abilities, if assessment is really concerned with providing information on learning. Instead, the need to understand performance itself and the processing (and affective) factors which influence it, suggests a task-based process approach and an integration of assessment and instruction. This implies a re-evaluation of the methods used in language testing research, and calls for focused and systematic experimental research "to illuminate all of these unresolved issues"(Weir 1998:9) involved in the specific measurement of communication competencies.

Self-assessment as a way of addressing this situation by encouraging the student to become part of the whole process of language learning, and to be aware of his/her progress. Of particular significance for students in Korea, Harris (1997:19) sees self-assessment as a practical tool that should be integrated into everyday classroom activities, and Blanche proposes that self-appraisal "would be particularly helpful in the case of false beginners" (1988:85).

This view of testing reflects the view that education is rarely undertaken simply "for itself". In the era of mass education, the (often implicit and unvoiced) driving force determining what was learned by whom has been the need for a society of educated citizens who can contribute to the economic growth of a nation. As industrialism and consumerism show destructive potential however, education is being seen by many educators not simply as a means of improving society, but of preventing its collapse. Thus Harri-Augstein & Thomas (1991:7). call for attention to "our capacity for learning" to "provide us with the resource to negotiate change, to prevent man-made catastrophes, to achieve success and to attain new standards of excellence and quality in our various human endeavours."

go to top

------------------------------------

References

Alderson, C.J. (1981). Report of the discussion on communicative language testing. On C.L.Alderson & a Hughes (Eds). Issues in Language Testing. ELT Document 111. The British Council.

Ames, C. & Archer, J. (1988). Achievement goals in the classroom: Students'learning strategies and motivation process. Journal of Educational Psychology, 80, 260-67.

Assinder, W. (1991) Peer teaching, peer learning. ELT Journal, 45, 3.

Bachman, L.F. (1989). The development and use of criterion-referenced tests of language proficiency in language program evaluation. In K. Johnson, (Ed.). (1989). Program Design and Evaluation in Language Teaching. Cambridge: Cambridge University Press.

Bachman, L. (1990). Fundamental Considerations in Language Ttesting. Oxford: Oxford University Press.

Bachman, L. & Palmer, A. (1996). Language Testing in Practice. Oxford: Oxford University Press.

Benson, M. (1991). Attitudes and motivation towards English: A survey of Japanese freshmen. RELC Journal, 22/1, 34-48.

Blanche, P. (1988). Self-assessment of foreign language skills: implications for teachers and researchers. RELC Journal, 19/1, 75-93.

Brindley, G. (1989). Assessing achievement in the learner-centred curriculum. Sydney: National Centre for English teaching.

Brown, J. D. (1988). Understanding Research in Second Language Learning.  New York: Cambridge University Press.

Brown, J. D. (1989a). Criterion-referenced test reliability. University of Hawai'i  Working Papers in ESL, 1, 79-113. 

Brown, J. D. (1989b). Language testing. a practical guide to proficiency, placement, diagnostic and achievement testing. Ms. Honolulu, Hawai'i: Department of ESL, University of Hawai'i at Mãnoa.

Brown, J.D. (1995). Differences between norm-referenced and criterion-referenced tests. In J. D. Brown, & S.O. Yamashita (Eds.). (1995). Language Testing in Japan. Tokyo, Japan: The Japan Association for Language Teaching,. pp.12-19.

Brumfit, C.J. & Johnson, K. (Eds.). (1979). The Communicative Approach to Teaching. Oxford: Oxford University Press.

Canale, M. (1983). On some dimensions of language proficiency. In J.W. Oller, Jr. (Ed.). Issues in Language Testing Research. Rowley, Mass.: Newbury House.

Canale, M. & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Lingusitics, 1, 1-47.

Carroll, B. (1981). Testing Communicative Performance. Oxford: Pergamon.

Cartier, F.A. (1968). Criterion-referenced testing of language skills. TESOL Quarterly, 2/1, 27-32.

Chamot, A.U. & O¡¯Malley, J.M. (1994). The CALLA Handbook: Implementing the Cognitive Language Learning Approach. Reading, MA: Addison Wesley.

Cziko, G.A. (1982). Improving the psychometric, criterion-referenced and practical qualities of integrative language tests. TESOL Quarterly, 16/3, 367-79.

Cziko, G.A. (1983). Psychometric and edumetric approaches to language testing. In J. W. Oller, Jr. (Ed.). Issues in Language Testing Research. Rowley, Mass.: Newbury House.

Cziko, G.A. (1984). Some problems with empirically-based models of communicative competence.  Applied Linguistics. 5, 23-38.

Darling-Hammond, L. (1994). Performance-based assessment and educational equity. Harvard Educational Review, 64, 1, 5-30.

Davies, A. (1990). Principles of Language Testing. Oxford: Basil Blackwell.

Dickinson, L. (1978). Autonomy, self-directed learning and individualization. In Individualization and Autonomy in Language Learning. ELT Documents 103. Modern English Publications and the British Council. 7-28.

Dickinson, L. (1987). Self-Instruction in Language Learning. Cambridge. Cambridge University Press.

Dickinson, L. & Carver, D.J. (1980). Learning how to learn: steps towards self-direction in foreign language learning in schools, English Language Teaching Journal 35, 1-7.

Fok, A.C.Y.Y. (1981). Reliability of student self-assessment. Hong Kong: H.K.U. Language Centre.

Harri-Augstein, S. & Thomas, L. (1991). Learning Conversations: The Self-Organised Learning Way to Personal and Organisational Growth. London: Routledge.

Harris, M. (1997). Self-assessment of language learning in formal settings. English Language Teaching Journal, 51/1, 12-20.

Hart, D. (1994). Authentic Assessment: a Handbook for Educators. New York: Addison Wesley.

Haughton, G. & L. Dickinson (1989) Collaborative assessment by masters'candidates in a tutor based system. Language Testing 5(2): 233-46.

Henner-Stanchina, C. & Holec, H. (1985). Evaluation in an autonomous learning scheme. In P. Riley (Ed.). (1985). Discourse and Learning. London: Longman.

Hill, B. (1994) Self-managed learning: state of the art. Language Teaching 27, 213-223.

Hudson, T. & Lynch, B. (1984). A criterion-referenced approach to ESL achievement testing. Language Testing, 1/2, 171-200.

Hughes, A. (1989). Testing for Language Teachers. Cambridge: Cambridge University Press.

Hunt, J., Gow, L. & Barnes, P. (1989). Learner self-evaluation and assessment - a tool for autonomy in the language learning classroom, in V. Bickley (Ed.). Language Teaching And Learning Styles Within and Across Cultures. Hong Kong: Institute of Language in Education, Education Department, 207-17.

Hymes, D. (1972). Models of the interaction of language and social life. In Ioup, G. (1984). Testing the relationship of formal instruction to the input hypothesis. TESOL Quarterly, 18/2, pp. 345-350.

Kelly, G. (1955). The Psychology of Personal Constructs. New York: Norton.

Kelly, R. (1978). On the construct validity of comprehensive tests: an exercise in applied linguistics. University of Queensland PhD thesis.

Kenyon, D. (1992). An investigation of the validity of the demands of tasks on performance-based tests of oral proficiency. Paper presented at the Language Testing Research Colloquium, Vancouver, Canada.

Kohonen, V. (1996). Learning contents and processes in context: towards coherence in eduational outcomes through teacher development. In L. Lösfman, L. Kurki-Suonio, S. Pellinen & J. Lehtonen (Eds.). (1996), Effectiveness of Teacher Education: New Challenges and Approaches to Evaluation. Tampere: Reports from the Department of Teacher Education in Tampere University A6, 63-84.

Kohonen, V. (1999). Authentic assessment in affective foreign language education. In J. Arnold, (Ed.). Affect in Language Learning. Cambridge: Cambridge University Press, pp. 279-94.

Larson-Freeman, D. (1997). Chaos/complexity science and second language acquisition. Applied Linguistics, 18/2, 141-165.

Lee, Wan-ki, (1991). A task-based approach to oral communication testing of English as a foreign language. Ph.D. thesis, Manchester University. Seoul: Hanshin Publishing Co.

McClean, J.M. (1995). Negotiating a Spoken-English Scheme with Japanese University Students. In J. D. Brown, & S.O. Yamashita (Eds.). (1995). Language Testing in Japan. Tokyo, Japan: The Japan Association for Language Teaching, pp. 136-147.

McNamara, T. (1995). Modelling Performance: opening Pandora¡¯s box. Applied Linguitics 16, pp. 159-79.

McNamara, T. (1996). Measuring Second Language Performance. London: Longman.

Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In H. Wainer, & H.I. Braun, (Eds.). Test Validity. Hillsdale, NJ: Lawrence Erlbaum Associates. pp. 33-45.

Miller, L. & R. Ng (1996) Autonomy in the classroom: peer assessment. In R. Pemberton, S.L. Edward, W.W.F. Or, and H.D. Pierson (Eds.). Taking Control: Autonomy in Language Learning. Hong Kong: Hong Kong University Press, pp. 133-146

Morrow, K. (1979). Communicative language testing: Revolution or evolution. In C.J. Brumfit, & K. Johnson (Eds.). (1979). The Communicative Approach to Teaching. Oxford: Oxford University Press. pp. 143-158.

Nakamura, Y. (1995). Making speaking tests valid: Practical considerations in a classroom setting. In J. D. Brown, & S.O. Yamashita (Eds.). Language Testing in Japan. Tokyo, Japan: The Japan Association for Language Teaching, pp. 126-135.

North, B. (1991). Standardization of continuous assessment grades. In C. Alderson and B. North (Eds.). Language Testing in the 1990's. Modern English Publications & the British Council.

Oller, J.W. (1979). Language Tests at Schools. London: Longman.

O'Malley, M. & Pierce, L.V. (1996). Authentic Assessment for English Language Learners. New York: Addison Wesley.

Oscarson, M. (1978) Approaches to Self-Assessment in Foreign Language Learning. Council of Europe, Council for Cultural Co-operation, Strasburg, France.

Oscarson, M. (1989). Self-assessment of language proficiency: rationale and implications. Language Testing, 6/1, 1-13.

Ranson, S. (1994). Towards the Learning Society. London: Cassell.

Rea, P.R. (1981).Formative assessment of student performance: the role of self-appraisal. Indian Journal of Applied Linguistics,. 7, pp 66-68.

Seliger, H. (1984). Processing universals in second language acquisition. In F. Eckman, L. Bell, and D. Nelson (Eds.) Universals of Second Language Acquisition. Rowley, MA: Newbury House.

Skehan, P. (1988). Language testing, part 1: state of the art article. Language Teaching, 21/4. pp. 211-218

Skehan, P. (1998). A Cognitive Approach to Language Learning. Oxford: Oxford University Press.

Spolsky, B. (1975). Language testing – the problem of validation. In L. Palmer & B. Spolsky (Eds.). Papers on language testing 1967-1974. Washington, D.FC.: TESOL. pp. 147-53.

Swain, M. (1985). Communicative competence: some roles of comprehensible input and comprehensible output in its development. In S.M. Gass & C.G. Madden Gass, (Eds.). Input in second language acquisition. Rowley, MA: Newbury House.

Thrasher, R.H. (1984). Educational validity. Annual reports, International Christian University, 9, 67-84.

Tudor, I. (1996). Learner-centredness as Language Education. Cambridge University Press: Cambridge.

Van Lier, L. (1996). Interaction in the Language Curriculum: Awareness, Autonomy, and Authenticity. London: Longman.

Widdowson, H.G. (1983). Learning Purpose and Language Use. Oxford: Oxford University Press.

Weir, C.J. (revised edition, 1998). Communicative Language Testing. Exeter: University of Exeter Press.

Williams, M. & Burden, R.L. (1997). Psychology for Language Teachers: A Social Constructivist Approach. Cambridge: Cambridge University Press.

-------------------------

go to top