语言测试基础ppt课件.ppt
Language Testing,Basic principles,2022/11/25,2,To understand the need for test theory, it is first necessary to understand something about the fundamental nature of,measurement ,constructs, and psychological tests.,2022/11/25,3,What are they measuring?,Height & weight,?,physical attributes,psychological attributes,2022/11/25,4,Unlike physical attributes, the psychological attributes of an individual cannot be measured directly as can height or weight. They are hypothetical concepts - products of the informed scientific imagination of social scientists who attempt to develop theories for explaining human behavior. The existence of such constructs can never be absolutely confirmed. The degree to which any psychological construct characterizes an individual can only be inferred from observations of his or her behavior.,What is Measurement,Measurement in the social sciences is the process of quantifying the characteristics of persons according to explicit procedures and rules.Quantification: assign numbersCharacteristics: physical and mental characteristics of persons (abilities/construct)Procedures and rules: replicable, for other observers, in other contexts and with other individuals.,2022/11/25,6,A well-known classification of measurement scales is given by Stevens (1951). These measurement scales are: 1. the nominal scale On the nominal scale objects are classified according to a characteristic. An example: one can classify persons with respect to sex, hair color, etc. 2. the ordinal scale An ordinal scale comprises the numbering of different levels of an attribute that are ordered with respect to each other.Example: individuals are ranked first, second, third and so on. 3. the interval scale An interval scale is a numbering of different levels in which the distances, or intervals, between the levels are equal.4. the ratio scale The ratio scale has a natural origin as well as equal intervals. Length in meters and weight in kilograms are defined on a ratio scale; so is temperature on the Kelvin scale. Ratio scales are relatively rare in psychology because of the difficulty of defining a zero point. How would persons look like with zero intelligence?,Measurement Scales,What is Test?,Carroll (1968): a psychological or educational test is a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual.Bachman (1990): a test is a measurement instrument designed to elicit a specific sample of an individuals behavior.,Characteristics that limit measurement (1/2),Limitations in specificationTwo levels of specification:Theoretical level Operational level,Characteristics that limit measurement (2/2),Limitations in observation and quantificationindirectness IncompletenessImprecision (ordinal instead of interval)Subjectivity (test developers choice of items & subjective scoring)Relativeness (no perfect norm of language use)Thus, a major of language test development is to minimize the effects of these limitations.,Classifying different types of language test (1/3),Intended UseSelectionEntranceDiagnosticPlacementAchievementResearch,Classifying different types of language test (2/3),Content Proficiency test (theory-based)Achievement test (syllabus-based)Language aptitude test,Classifying different types of language test (3/3),Frame of reference Norm-referenced test (test results are interpreted with reference to the performance of a given group.) Typical norms: mean, average score, standard deviation.Criterion test (syllabus-based)Difference: Discriminativeness,Standard deviation and normal distribution,Measure of variability,Test qualities,Reliability: consistency of measurement; minimize error variance,(Brown, 1996: 189),图示,Test qualities,Construct validity: to which extent we can interpret a given test score as an indicator of the construct (ability)Reliability is a necessary but not sufficient condition for construct validity.,Five types of validity,Construct validityAn indication of how representative a test is of an underlying theory of language learning. Construct validation involves in investigation of the qualities that a test measures, thus providing a basis for the rationale of a test. Content validityDescribes how well the content of the test samples the subject matter that the course of instruction aimed to teach.Predictive validityThe degree to which predictions made from the test are confirmed by evidence gathered at some later time.Concurrent validityConcerned with the relationship between what is measured by a test (usually a newly developed test) and another existing criterion measure, which may be a well-established standardized test. If the two measures function similarly (i.e. they rank candidates in the same way), they are considered to have concurrent validity.Face validityThe degree to which a test appears to measure the knowledge or abilities it claims to measure, as judged by an untrained observer.,Relationship between reliability and validity,In order for a test to be valid, it first needs to be reliable.Investigation of reliability and validity can be viewed as complementary aspects of identifying, estimating, and interpreting different sources of variance in test scores.Reliability is concerned with determining how much of the variance in test scores is reliable variance, while validity is concerned with determining what abilities contribute to this reliable variance.Although it is essential to consider both reliability and validity in the development and use of language tests, the distinction between them may not always be clear.,图示,Test qualities,Authenticity: correspondence of the characteristics of a given language test task to the features of a TLU task.Interactiveness: the extent and type of involvement of the test takers individual characteristics in accomplishing a test task.test takers individual characteristics: language ability, topical knowledge and affect schema.Both are relative,(Li, 1997: 175),Test qualities,Impact: impact on society, educational systems and the individuals within those systems.Micro level: individuals who are affected by a particular test.Macro level: society and educational system.Practicality: the ways in which the test will be implemented. Human resources , material resources, time,Quality ControlSix qualities of usefulness,ReliabilityConstruct validityAuthenticityInteractivenessImpactPracticality,Goal: To achieve an appropriate balance among the qualities so that the overall usefulness of the test is maximized.,Describing language ability: language use in language tests,Communicative language ability model,Characteristics of individual language users,Communicative LanguageAbility,Defining construct,Communicative language ability Language knowledgeStrategic competence (metacognitive strategies),Bachman (1990),Comment,Provide an overall and updated picture of components of language abilityLack of acknowledgement of the interactions between components.,Strategic competence,Goal setting: deciding what one is going to doAssessment: taking stock of what is needed, what one has to work with, and how well one has done.Planning: deciding how to use what one hasComment: lack of an operational instruction, thus hard to pin down the actual behavior.,Other individual Characteristics,Personal Characteristics: age, sex, native language, education,etc.Topical knowledge: knowledge structure in long-term memory.Affective schemata: the affective or emotional correlates of topical knowledge.,Features of communicative language tests,Reflecting components of language use competence in particular contextsAs representative as possibleClear purpose of testingThe role of context being stressedAuthenticityDirect and integrated testingCriterion-referenced Holistic and qualitative assessment of productive skillsEstablishing the theoretical and empirical validity of measuresEnhanced match between teaching, testing and reality Cyril J. Weir (1990),Overview of test development,Development of language test,Define constructs theoreticallyThere must be agreement on the theoretical definition upon which the test is based.Define constructs operationallyQuantifying observations: establish a scale,(Bachman & Palmer, 1996: 87),