TEFL Assessment Glossary

Achievement test An examination that measures educationally relevant skills or knowledge about such subjects as reading, spelling, or mathematics.

Age norms Values representing typical or average performance of people of certain age groups.

Authentic task A task performed by students that has a high degree of similarity to tasks performed in the real world.

Average A statistic that indicates the central tendency or most typical score of a group of scores. Most often average refers to the sum of a set of scores divided by the number of scores in the set.

Battery A group of carefully selected tests that are administered to a given population, the results of which are of value individually, in combination, and totally.

Ceiling The upper limit of ability that can be measured by a particular test.

Criterion-referenced test A measurement of achievement of specific criteria or skills in terms of absolute levels of mastery. The focus is on performance of an individual as measured against a standard or criteria rather than against performance of others who take the same test, as with norm-referenced tests.

Diagnostic test An intensive, in-depth evaluation process with a relatively detailed and narrow coverage of a specific area. The purpose of this test is to determine the specific learning needs of individual students and to be able to meet those needs through regular or remedial classroom instruction.

Dimensions, traits, or subscales The sub-categories used in evaluating a performance or portfolio product (e.g., in evaluating students writing one might rate student performance on subscales such as organization, quality of content, mechanics, style).

Domain-referenced test A test in which performance is measured against a well-defined set of tasks or body of knowledge (domain). Domain-referenced tests are a specific set of criterion-referenced tests and have a similar purpose.

Grade equivalent The estimated grade level that corresponds to a given score.

Holistic scoring Scoring based upon an overall impression (as opposed to traditional test scoring which counts up specific errors and subtracts points on the basis of them). In holistic scoring the rater matches his or her overall impression to the point scale to see how the portfolio product or performance should be scored. Raters usually are directed to pay attention to particular aspects of a performance in assigning the overall score.

Informal test A non-standardized test that is designed to give an approximate index of an individual's level of ability or learning style; often teacher-constructed.

Inventory A catalog or list for assessing the absence or presence of certain attitudes, interests, behaviors, or other items regarded as relevant to a given purpose.

Item An individual question or exercise in a test or evaluative instrument.

Norm Performance standard that is established by a reference group and that describes average or typical performance. Usually norms are determined by testing a representative group and then calculating the group's test performance.

Normal curve equivalent Standard scores with a mean of 50 and a standard deviation of approximately 21.

Norm-referenced test An objective test that is standardized on a group of individuals whose performance is evaluated in relation to the performance of others; contrasted with criterion-referenced test.

Objective percent correct The percent of the items measuring a single objective that a student answers correctly.

Percentile The percent of people in the norming sample whose scores were below a given score.

Percent score The percent of items that are answered correctly.

Performance assessment An evaluation in which students are asked to engage in a complex task, often involving the creation of a product. Student performance is rated based on the process the student engages in and/or based on the product of his/her task. Many performance assessments emulate actual workplace activities or real-life skill applications that require higher order processing skills. Performance assessments can be individual or group-oriented.

Performance criteria A predetermined list of observable standards used to rate performance assessments. Effective performance criteria include considerations for validity and reliability.

Performance standards The levels of achievement pupils must reach to receive particular grades in a criterion-referenced grading system (e.g., higher than 90 receives an A, between 80 and 89 receives a B, etc.) or to be certified at particular levels of proficiency.

Portfolio A collection of representative student work over a period of time. A portfolio often documents a student's best work, and may include a variety of other kinds of process information (e.g., drafts of student work, student's self assessment of their work, parents' assessments). Portfolios may be used for evaluation of a student's abilities and improvement.

Process The intermediate steps a student takes in reaching the final performance or end-product specified by the prompt. Process includes all strategies, decisions, rough drafts, and rehearsels-whether deliberate or not-used in completing the given task.

Prompt An assignment or directions asking the student to undertake a task or series of tasks. A prompt presents the context of the situation, the problem or problems to be solved, and criteria or standards by which students will be evaluated.

Published test A test that is publicly available because it has been copyrighted and published commercially.

Rating scales A written list of performance criteria associated with a particular activity or product which an observer or rater uses to assess the pupil's performance on each criterion in terms of its quality.

Raw score The number of items that are answered correctly.

Reliability The extent to which a test is dependable, stable, and consistent when administered to the same individuals on different occasions. Technically, this is a statistical term that defines the extent to which errors of measurement are absent from a measurement instrument.

Rubric A set of guidelines for giving scores. A typical rubric states all the dimensions being assessed, contains a scale, and helps the rater place the given work properly on the scale.

Screening A fast, efficient measurement for a large population to identify individuals who may deviate in a specified area, such as the incidence of maladjustment or readiness for academic work.

Specimen set A sample set of testing materials that is available from a commercial test publisher. The sample may include a complete individual test without multiple copies or a copy of the basic test and administration procedures.

Standardized test A form of measurement that has been normed against a specific population. Standardization is obtained by administering the test to a given population and then calculating means, standard deviations, standardized scores, and percentiles. Equivalent scores are then produced for comparisons of an individual score to the norm group's performance.

Standard scores A score that is expressed as a deviation from a population mean.

Stanine One of the steps in a nine-point scale of standard scores.

Task A goal-directed assessment activity, demanding that the student use their background of knowledge and skill in a continuous way to solve a complex problem or question.

Validity The extent to which a test measures what it was intended to measure. Validity indicates the degree of accuracy of either predictions or inferences based upon a test score.