CHAPTER 6: Curriculum design
6.3 The syllabus
A focus on "language-learning as education", with attention to cognition, affect, and socio-cultural aspects of learning, presented in a problem-solving, task-based framework, and derived from the literature reviews in chapter 3, has already been outlined as a guiding principle behind the ANU programme. When considering the syllabus (a framework within which activities can be carried out: a teaching device to facilitate learning (Nunan 1988c:6; section 3.4.1.3), this focus leads to specific interpretations of syllabus-design issues as described by Breen & Candlin (1980):
- What communicative knowledge - and its affective aspects - does the learner already possess and exploit?
- What communicative abilities - and the skills which manifest them - does the learner already activate and depend upon in using and selecting from his/her established repertoire?
- Can the performance repertoire of the learner's first language be employed?
- Can existing knowledge of and about the target repertoire be used?
- What is the learner's own view of the nature of language?
- What is the learner's view of learning a language?
- How does the learner define his/her own learning needs?
- What is likely to interest the learner both within the target repertoire and the learning process?
- What are the learner's motivations for learning the target repertoire? (Adapted from Breen & Candlin 1980:93-4)
These issues are answered in terms of this study, in table A-51. Given that the programme was designed to be sensitive to changing needs and situations, the questions posed by Breen & Candlin (above) were treated in an ongoing manner, (learner training, reflection, self-assessment, teacher development, programme feedback, etc.), the needs and opinions of the students being constantly monitored, and appropriate programme adjustments made.
6.3.1 Syllabus goals
Willis (1996) offers five principles for the implementation of a task-based approach. These provide input, use, reflection on the input and use, and some attention to affect:
- There should be exposure to worthwhile and authentic language.
- There should be use of language.
- Tasks should motivate learners to engage in language use.
- There should be a focus on language at some points in a task cycle.
- The focus on language should be more and less prominent at different times. (Adapted from Willis 1996)
Skehan (1998) also proposes five principles for task-based instruction, paying greater attention to affect, but still largely ignoring socio-cultural aspects:
- Choose a range of target structures (learners do not simply learn what teachers teach. It is ineffective to choose a particular structure to be learned).
- Choose tasks which meet the utility criterion (the teacher can only create appropriate conditions and hope the learners will avail themselves of the possibilities).
- Select and sequence tasks to achieve balanced development.
- Maximise the chances of a focus on form through attentional manipulation.
- At initial stages of task use, conditions need to be established to maximise the chances of noticing. (Adapted from Skehan 1998:129-32)
When designing the syllabi in this study, Willis' and Skehan's principles (above) provided a benchmark for the design of the interactive learning materials (TMAI, TMM, NYT, TWA), as demonstrated in section 7.3).
Desired learning outcomes of the syllabi were not specifically knowledge-based, but centred on the two affective/psycho-social/strategic (CMI) triads, (plus communicative competence), which were addressed (mostly implicitly) through the textbooks, themselves embodiments of the syllabi. There were notional, functional and grammatical signposts in the tables of contents of these books, which provided more "familiar" direction for teachers and students, but these were a means to an affective/humanistic/communicatively-competent end, rather than being an attempt to re-cover in scant classroom time linguistic content that had been previously studied in middle school and high school.
6.4 Assessment
6.4.1 Introduction
The topic of language testing in general and performance testing in particular, is fraught with problems of theory and practice, so that (as with needs assessment and curriculum design), further review of the relevant literature was necessary before designing a systematic assessment component for the ANU language curriculum. The results of this survey, and the conclusions it led to, are presented below in section 6.4.2. The reader is referred to Skehan's "state of the art article" of 1988, for a more detailed survey of factors concerned in language testing, and to Lee's doctoral thesis on task-based oral testing (1991) for a review of the situation in Korea.
6.4.2
Language testing: brief survey of the literature
Skehan (1998:153) defines a test as "a systematic
method of eliciting performance which is intended to be the basis for
some sort of decision making", and draws attention to the tendency of
testers to place an emphasis on "care and standardization in assessment
in the belief that such methods of examining performance will have more
to contribute to reliable measurement than informal assessment by people
who may be very familiar with particular language users" (Skehan
1998:153). Such an attitude can be seen as following from the assumption
that "there are knowable best ways of learning and that these can
be discovered using a scientific method which has long been discarded
by contemporary philosophers (Popper), scientists (Medawar) and physicists
(Heisenberg)" (Harri-Augstein & Thomas 1991:7).
This assumption has been at the heart of language testing from its "pre-scientific"
stage (Spolsky 1975), to its psychometric-structuralist
"scientific stage" (when discrete-point testing represented the accepted
behaviourist ˇ®truth'), and reflects the views that language can be learned
by studying its parts in isolation, that acquisition of these parts can
be tested and will successfully predict performance levels, and that the
learner can be relied on to reconstruct the parts in meaningful situations
when necessary. These "truths" did not disappear in the 1970s, when
integrative testing (e.g. cloze tests and dictation) claimed to come from
a sounder theoretical base (Oller 1979a) but were
shown by commentators such as Alderson (1981a),
Morrow (1979) and Carroll (1981:9) to be still
concerned with usage rather than use (Widdowson 1983),
therefore being only indirect tests of potential efficiency.
Kelly (1978:245-6) also points out that it is possible to develop
proficiency in the integrative test itself, and that indirect tests cannot
diagnose specific areas of difficulty in relation to the authentic task.
Such tests can only supply information on a candidate's linguistic competence,
and have nothing to offer in terms of performance ability (Weir
1998).
A consensus that "knowledge of the elements of a language in fact counts for nothing unless the user is able to combine them in new and appropriate ways to meet the linguistic demands of the situation in which he wishes to use the language" (Morrow 1979:145), and an acknowledgement that the easily quantifiable, reliable, and efficient data obtained from discrete (and cloze) testing implies that proficiency is neatly quantifiable in such a fashion (cf. Oller 1979a:212), led to a perception that it would be preferable to test the ability to perform in a specified socio-linguistic setting (Spolsky 1958). Based on work by Hymes (1972), Rea (1978), Morrow (1979) and Canale & Swain (1980), the emphasis thus shifted from linguistic accuracy to the ability to function effectively through language in particular contexts of situation (a demonstration of competence and of the ability to use this competence), and communicative testing was adopted as a means of assessing language acquisition (though with some lack of initial agreement or direction, cf. McClean 1995:137; Benson, 1991). For Canale & Swain (1980), testing communicative language ability included grammatical competence (knowledge of the rules of grammar), sociolinguistic competence (knowledge of the rules of use and of discourse) and strategic competence (knowledge of verbal and non-verbal communication strategies). Canale (1983) later updated this to a four-dimensional model - linguistic, sociolinguistic, discoursal (cohesion and coherence) and strategic competences, while Bachman (1990) saw it as consisting of language competence, strategic competence, and psycho-physiological mechanisms. However, the relationship between the various competences, the way they are integrated into overall communicative competence, and the way this is translated into communicative performance are all in need of clarification, and such models are themselves in need of validation (cf. Swain 1985; Skehan 1988; Brindley 1989). Skehan (1988) articulates the dilemma of communicative language testing at the end of the 1980s:
What we need is a theory which guides and predicts how an underlying communicative competence is manifested in actual performance; how situations are related to one another, how competence can be assessed by examples of performance on actual tests; what components communicative competence actually has; and how these interrelate ... Since such definitive theories do not exist, testers have to do the best they can with such theories as are available. (Skehan 1988, cited in Weir 1998:7)
Thus Canale & Swain's (1980) framework, though an insightful start, is seen by Skehan (1998:159) as neither practical nor comprehensive (cf. Cziko 1984), possessing no systematic basis, and unable to advance prediction and generalisation in any substantial way, a problem that was addressed in later developments (Bachman 1990; Bachman & Palmer 1996) by application of categories to real contexts. The Bachman (1990:87) model still lacks a "rationale founded in psycholinguistic mechanisms and processes (and research findings) which can enable [it to] ... make functional statements about the nature of performance and the way it is grounded in competence" (Skehan 1998:154), but it is: i) more detailed in its specification of component language competences; ii) more precise in the interrelationships between the different component competences; iii) more grounded in contemporary linguistic theory; and iv) more empirically based, allowing a more effective mapping of components of competence on to language use situations, and more principled comparisons of those components.
6.4.2.1 Task-based testing
Bachman's model (1990:87) uses familiar empirical research methods in which data is perceived in terms of the underlying structural model, to infer abilities via a static picture of proficiency, based on the assumption that there are competence-oriented underlying abilities made up of different interacting components (cf. Canale & Swain 1980). However, cognitive theory tells us that second language performers, faced with a developing interlanguage, and performance pressures such as fluency, accuracy and complexity, do not draw upon "a generalized and stable underlying competence" (Skehan 1998:169), but allocate limited processing attention in appropriate ways, drawing on parallel coding systems for efficiency of real-time communication. Given this redefined view of the competence-performance relationship, Skehan (1998) proposes a construct of ˇ®ability for use', which would allow a processing competence to operate and to be assessed, and advocates the use of tasks as a central unit within a testing context, with the proviso that we need to know "more about the way tasks themselves influence (and constrain) performance" (1998:169). McNamara (1995;1996), following Kenyon (1992), provides a model of such multiple influences on test performance, in which the learner handles the fluctuating communicative demands through a processing competence. In Skehan's expanded version of McNamara's model, assessment of competences and ability for use involves generalised processing capacities and meaningful language use, with tasks (including task qualities, types, and characteristics) being central to predictions of performance and generalisations across contexts (cf. Weir 1998). Skehan sees the conditions under which tasks are performed and the way conditions interact with performance as "a fertile area for research" (1998:177).14.0pt; 10.0pt">
6.4.2.2 Validity/reliability
Language testing has traditionally been limited by considerations of validity (whether tests actually measure what they are supposed to measure [Thrasher 1984]), reliability (whether they produce similar results on more than one occasion), and efficiency (logistics of test administration) (Weir 1998:1). Validity is seen by Spolsky (1975) and Messick (1988) as the major problem in foreign language testing, and includes content validity (the extent to which the test a representative sample of the language skills and structures it is meant to test), criterion-related validity, construct-validity (the extent to which the test matches a theoretical construct - Bachman 1990), face-validity (the extent to which the test looks reasonable to the test-taker), predictive validity (the predictive force of the test), concurrent-validity (whether the test and the criterion are administered at the same time - Davies 1990), and educational validity (the relationship between positive test effects and students' study habits - Thrasher 1984). Nakamura (1995) argues that predictive validity, educational validity, construct validity, concurrent validity, face validity and content validity should be analysed in tests of speaking ability, and Kohonen (1999:291) also stresses validity in communicative evaluation.
Williams & Burden (1997) argue that the energy spent by test constructors on strengthening the reliability and validity of their tests so that they can be standardised is largely misspent, since this assumes that the test is measuring a relatively fixed characteristic, rather than a hypothetical construct (the researcher's best attempt to define what is involved). In fact, individual- and affect-related traits are variable, and often context specific, such that "a test should be expected to produce different results on different occasions" (Williams & Burden 1997:90). As Kelly mentions (1955:77), when a subject fails to meet the experimenter's expectations all that can be said is that he/she has not conformed to those expectations or to the experimenter's definition of learning. In recognition of this problem, researchers have employed the concept of construct validity to indicate how well the test relates to the construct under investigation, but this still does not mean that it actually exists: "The point is that it is extremely difficult to construct a test which is truly valid in that it really measures what it is supposed to measure" (Williams & Burden 1997:90). Weir (1998:7) also points out that the validity of "communicative" tests is dependent on the test-constructor's understanding and definition of the term, and Van Lier (1996) goes deeper still into the "accountability" of tests which can only measure that which is measurable:
It is quite possible that the deepest, most satisfying aspects of achievement, and the most profound effects of education, both in positive and negative terms, are entirely unmeasurable ... What if we held educators accountable for the quality of the memories they gave to their students, rather than for averages on national tests? (Van Lier 1996:120)
6.4.2.3 Criterion-referenced vs. norm-referenced testing
Authentic assessment (cf. section 6.4.2.4, below) in a task-based process setting implies a focus on language mastery (criterion-referenced performance) rather than relative performance (norm-referenced performance), a focus which Ames and Archer (1988) found to be highly motivating in the classroom, fostering long-term use of learning strategies and helping students form realistic but challenging goals. As Darling-Hammond (1994:10) points out, assessment needs to support authentic forms of teaching and learning (cf. North 1991; Kohonen 1996), and task-based process assessment involves a criterion-referenced orientation, with Criterion-Referenced Tests (CRTs) providing direct information "about what the learner can actually do with the target language" (McClean 1995:137). Strengths and weaknesses can be isolated across the whole test population, and specific information can be gained about an individual's performance, in contrast to Norm-Referenced Tests (NRTs), which tend to give information only about learners at either ends of the scale (cf. Cartier 1968; Cziko 1982; 1983; Hudson & Lynch 1984; Brown 1988; 1989a; 1990a & b; Bachman 1989; 1990; McClean 1995:146, table A-44, below). Brown (1995) points out that programme evaluations tend to use tests that are sensitive to the goals and objectives of the programme, and that such programme-sensitive tests (sometimes called programme-fair tests) "are by definition criterion-referenced" (Brown 1995:18), being different from NRTs either in test characteristics or logistics (cf. table A-45, below).


6.4.2.4 Authentic assessment
Observable factors that appear to be associated with learning include construction of meaning, sharing of experiences, identification of needs and purposes, critical evaluation of performance strategies, and awareness of this process (Harri-Augstein & Thomas 1991:7). Thus Kohonen (1999) extends Skehan's task-based framework (1998:177; cf. figure B-24), and proposes "Authentic assessment" as a process-oriented means of evaluating communicative competence, cognitive abilities and affective learning (Kohonen 1999:284; cf. Hart 1994:9; O'Malley & Pierce 1996:x-6), using reflective forms of assessment in instructionally-relevant classroom activities (communicative performance assessment, language portfolios and self-assessment), and focusing on curriculum goals, enhancement of individual competence; and integration of instruction and assessment. In this two-way process, "the essentially interactive nature of learning is extended to the process of assessment" (Williams & Burden 1997:42), examining what learners can do with their language, through real-life language use tasks (cf. Weir 1998:9). For the learner this means developing reflective awareness through self-assessment and peer assessment, learning "how to manage ... learning, rather than just managing to learn" (Kohonen 1999:291). Kohonen (1999) lists twelve ways in which authentic assessment can enhance learning, and summarises how this approach contrasts with standardised testing (cf. table A-44).
6.4.3 The Korean situation
The situation relating to oral testing in Korea at the time of this study (cf. Lee 1991) was similar to that in Japan, as described by McClean (1995):
What little oral assessment has been done in Japanese universities so far has been isolated, haphazard, lacking in form, and subjective. Testers are confused about what specifically to focus on when assessing test-takers' oral competence. (McClean 1995:137)
Before coming to University, Korean students' oral abilities are measured through reading or writing tests, which encourage students to handle indirect rather than realistic tasks (Hughes 1989; Brindley, 1989; Lee 1991; Weir 1998), an understandable situation in schools which must train students to pass the mostly multiple-choice discrete-item University entrance tests. Thus teachers at all levels are faced with the dilemma of preparing students for important national NRTs while being asked by the government to incorporate a (non-tested) communicative element to their teaching. Such a situation is reminiscent of Rea's comment on testing in the late 1970s:
Although we would agree that language is a complex behaviour and that we would generally accept a definition of overall language proficiency as the ability to function in a natural language situation, we still insist on, or let others impose on us, testing measures which assess language as an abstract array of discrete items, to be manipulated only in a mechanistic way. Such tests yield artificial, sterile and irrelevant types of items which have no relationship to the use of language in real life situations. (Rea 1978:51, cited in Weir 1998:3)
If students are to learn in a way that motivates and is meaningful to them (given that these factors will enhance and promote language acquisition), this will involve consciousness-raising (language learning awareness), reflection (self-assessment), and development of learning strategies, as part of "actual" language study. Assessment in this context exists to give information to the learner and the teacher in terms of learning strengths and weaknesses, so that future goals can be set and learning plans devised. Testing which concentrates on the "target-like appearance of forms" (Larsen-Freeman 1997:155) ignores the fact that "we have no mechanism for deciding which of the phenomena described or reported to be carried out by the learner are in fact those that lead to language acquisition" (Seliger 1984:37), as well as the fact that the learner's internal grammar is not a steady commodity and often deteriorates prior to internalising new content. Even if we could identify and measure all of the factors in second language acquisition, complexity theory (section 6.6) tells us that "we would still be unable to predict the outcome of their combination" (Larsen-Freeman 1997:157).
6.4.4 Assessment - conclusion
In the recent shift in educational theory from transmission of knowledge towards transformation of knowledge, and to integration of knowledge with existing personal constructs and meanings (Kohonen 1999:280), assessment has taken on new affective goals in which the personal growth of the learner is becoming increasingly important (Ranson 1994:116). Thus it is no longer defensible to use discrete-item testing of dubious constructs or to sample performance as a means of inferring underlying competence or abilities, if assessment is really concerned with providing information on learning. Instead, the need to understand performance itself and the processing (and affective) factors which influence it, suggest a task-based process approach and an integration of assessment and instruction. This implies a re-evaluation of the methods used in language testing research, "to illuminate all of these unresolved issues" (Weir 1998:9).
6.4.5 The study
Based on the findings of the above survey, and recognising that learner perceptions and beliefs are important factors in determining what students learn and how they learn it (Williams & Burden 1996:205), evaluative procedures in the present study began (1997) with criterion-referenced task-based oral tests of recently-studied learning content, and (following teacher-led feedback and discussion) metamorphized through 1998 and 1999, gradually becoming integrated into the learning environment. Evaluation became a process of ongoing self-assessment and peer assessment (cf, section 3.2.3), and "final" oral tests in years 1 & 2 (Freshmen & Sophomores) took on principles of "authentic testing" (Kohonen 1999), being designed to promote learning as well as providing feedback on that learning (cf. section 7.3.4,). In year 3 (Juniors) the learner training emphasis was more pronounced, and "Evaluation Sessions" occurred four times during the year, taking the form of "learning conversations" (cf. Harri-Augstein & Thomas 1991:6), in which students discussed their goals and achievements, and their learning plans for the future (cf. appendices C-62-63). These "final" tests and "conversations" were marked according to "range/fluency/delivery/ attitude/interaction" criteria (cf. table A-52, below; Lee 1991:280), but their share of the final grade was reduced (from 25% [1997] to 15% [1999]) and their purpose was acknowledged as providing information for the students rather than about them. See section 7.3.4 for more information about the implementation of oral testing in the programme.

Continue reading Chapter 6: Teacher Development