英语教学与测试Language Testing.ppt
Language Testing,Wei Beibei,Language Testing,Introduction to language testingStages of test constructionTesting language skills and elementsCommon testing techniquesInterpreting test scoresAchieving beneficial backwash,I.Introduction,Definition of terms:test,measurement,evaluationApproaches to language testingTest purposesTypes of testsCriteria of tests,1.Definition of terms,Test,Measurement,Evaluation,Test-1,Carroll(1968)provides the following definition of a test:a psychological or educational test is a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual.,测试是用来获取某些行为的方式、方法,其目的是从这些行为中推断个人具有的某些特征。,Test-2,Anastasi(1962):“测试实质上是对行为样本所做的客观的标准化的测量。”三要素:行为样本 客观的测量 标准化的测量(刘润清:P4),Test-3,Bachman(1999):Measurement in the social science is the process of quantifying the characteristics of persons according to explicit procedures and rules.,Measurement1,测量是根据明确的程序和规则量化研究对象特征的过程。Three distinguishing features:quantification characteristics explicit rules and procedures,Measurement2,Stevens(1951):“测量就是根据法则赋予事物数量。”(按照一定的规则给事物的属性指派数字或符号的过程)三要素:事物及其属性 指派数字或符号 法则(刘:P2),Measurement3,Weiss(1972):Evaluation can be defined as the systematic gathering of information for the purpose of making decisions.,Evaluation1,“评价指为做出某种决策而收集资料,并对资料进行分析,作出解释的系统过程。”与测量、测试相比其含义更广,综合性更强。,Evaluation2,Relationships among the three:,(刘:P5),An example of evaluation that does not involve either tests or measures(area 1)is the use of qualitative descriptions of student performance for diagnosing learning problems.An example of a non-test measure for evaluation(area 2)is a teacher ranking used for assigning grades,while an example of a test used for purposes of evaluation(area 3)is the use of an achievement test to determine student progress.The most common non-evaluative uses of tests and measures are for research purposes.,An example of tests that are not used for evaluation(area 4)is the use of a proficiency test as a criterion in second language acquisition research.Finally,assigning code numbers to subjects in second language research according to native language is an example of a non-test measure that is not used for evaluation(area 5).In summary,then,not all measures are tests,not all tests are evaluative,and not all evaluation involves either measurement or tests.,Bachman:Neither measures nor tests are in and of themselves evaluative,and evaluation need not involve measurement or testing.并非所有的测量都是测试,并非所有的测试都属于评价,而且并非所有的评价活动都涉及到测试或测量。,三者关系,2.Approaches to language testing,the essay-translation approach 写作翻译法the structuralist approach 结构主义法the integrative approach 综合法the communicative approach 交际法,(1):,The essay-translation approach,This approach is commonly referred to as the pre-scientific stage of language testing.No special skill or expertise in testing is required:the subjective judgment of the teacher is considered to be of paramount importance.Tests usually consist of essay writing,translation,and grammatical analysis.The tests also have a heavy literary and cultural bias.,写作翻译法的特点,对测试技能或专长没有专门要求,主要依靠老师的主观判断力;试卷主要包括写作、翻译、语法分析等项目;试卷内容有较浓厚的文学和文化色彩;试题需要书面回答形式,需要人工阅卷。,The structuralist approach,This approach is characterized by the view that language learning is chiefly concerned with the systematic acquisition of a set of habits.It draws on the work of structural linguistics,in particular the importance of contrastive analysis and the need to identify and measure the learners mastery of the separate elements of the target language:phonology,vocabulary and grammar.,结构主义法的特点,强调分别测试不同的语言成分,如语音、词汇和语法,脱离上下文单独测试,听说读写等语言技能也可分开测试;采用了心理测量方法(psychometric approach),强调测试的可靠性和客观性,其典型的表现形式是多项选择题,一个题目测试一个成分;便于进行考后的统计。,The integrative approach,This approach involves the testing of language in context and is thus concerned primarily with meaning and the total communicative effect of discourse.They are often designed to assess the learners ability to use two or more skills simultaneously.Integrative tests are best characterized by the use of cloze testing and of dictation.Oral interview,translation and essay writing are also included in many integrative tests.,综合法的特点,强调语言测试要在上下文中进行;不在测试中刻意追求区别各个单项语言技能,而是强调两项或以上语言技能的综合评估,题型包括填空、听写、翻译、写作等,从整体上对学生的语言能力进行测量。,The communicative approach,This approach to language testing is sometimes linked to the integrative approach.However,although both approaches emphasize the importance of the meaning or utterances rather than their form and structure,there are nevertheless fundamental differences between the two approaches.Communicative tests are concerned primarily with how language is used in communication.Language use is often emphasized to the exclusion of language usage.,交际法的特点,与综合法不同,交际法更加强调的是语言在交际过程中的使用(use)而非用法(usage:语言的形式和结构);某些交际测试不排除包含有关语言用法的内容;交际语言测试建立在对学生需求的分析上,强调其真实性。(如BEC),科学前语言测试(第一代体系)pre-scientific testing 心理测量学-结构主义语言学测试(第二代)psychometric-structualist testing 交际语言测试(第三代)communicative language testing 或 心理语言学-社会语言学测试 psycholinguistic-sociolinguistic testing,(2)刘润清语言测试的理论模式(P19),(3)Canale 和 Swain模式(1980),交际能力由四个部分组成:(1)语法能力包括语音、词汇、语法等语言知识,这些是理解和表达语言的字面意思所必需的知识;(2)社会语言能力包括在不同的社会环境中,理解和表达形式与意思都恰如其分的语言能力;(3)语篇能力包括把语言形式和内容结合的能力;(4)交际策略能力包括在交际时如何开始、如何继续、如何调整和转换话题,以及如何结束谈话等能力。该模式的缺陷是没有明确指出四种能力之间的关系如何,另外,把策略能力仅仅当作一种语言补偿能力似乎忽视了正常语言交际活动中的语言使用策略能力。,The framework of CLA includes three components:language competence,strategic competence,and psycho-physiological mechanisms.语言交际能力由语言能力、策略能力和心理生理机制三个部分组成。(刘:P23),(4)Bachman(1990):Communicative Language Ability,Bachman的语言交际能力的各个组成部分,情景评估,目标用特定的功能、形式和内容理解或表达言语,语言能力语言组织能力语用能力,心理生理机制,制定计划过程从语言知识库中取材料,计划组织材料,以期导向交际目标,实施神经的和生理的过程,话语表达或理解语言,(Bachman的语言使用模式),3.Test purposes,刘润清:,Purposes of diagnosis and backwashPurposes of comparison and selectionPurposes of placementPurposes of research or survey(P6),Arthur Hughes:,to measure language proficiency regardless of any language courses that candidates may have followedto discover how far students have achieved the objectives of a course of studyto diagnose students strengths and weaknesses,to identify what they know and what they do not knowto assist placement of students by identifying the stage or part of a teaching program most appropriate to their ability,4.Types of tests,(Heaton:171-173),Placement test 编班测试Classroom test 随堂测试Mid-term test 期中测试End-of-term test 期末测试,(1)According to different learning periods按照学习阶段,(刘:P8-16),Progress test 进步测试Proficiency test 水平测试Achievement/Attainment test 成绩测试Aptitude test 潜能测试Diagnostic test 诊断测试,(2)According to test purposes 按照测试目的(用途),Discrete-point test分离式测试Integrative test综合性测试,(3)According to test methods 按照测试方法,Norm-referenced test 常模参照性测试Criterion-referenced test 标准参照性测试,(4)According to interpretationsof test scores 按照对考试分数的解释,Subjective test 主观性测试Objective test 客观性测试,(5)According to scoringmethods按照试卷的评阅方式,Communicative testing 交际性测试Pragmatic test 语用测试,(6)Other types of test,5.Criteria of tests,Validity 效度Reliability 信度Power/Difficulty 难度Discrimination 区分度Practicality 实用性Backwash effects 后效作用,Criteria of tests,Validity,The validity of a test is the extent to which it measures what it is supposed to measureand nothing else.效度是指一套测试所考的是否就是设计人想要考的内容,或者说,在多大程度上考了想要考的。,Discuss on the following items:,“Is photography an art or a science?”Discuss.“The mind is in its own place,and itself can make a Heaven of Hell,a Hell of a Heaven.”(Milton)Discuss.,Use the following words in sentences:courageous,choosy,acceptable,complicated,etc.A.John is a very courageous boy.B.John,the captain of our team,is courageous.C.I have a courageous father.,Factors of validity,Face validity 表面效度Content validity 内容效度Construct validity 结构效度Empirical validity 实验效度Concurrent validity 共时效度Predictive validity 预测效度,Face validity,If a test item looks right to other testers,teachers,moderators,and testees,it can be described as having at least face validity.表面效度指考试表面的可信度或公众的可接受程度。邹申:一个考试看上去具有了拟定的技能或能力测试。(测语音语调用笔头考试来测则表面效度低。),Content validity,A test is said to have content validity if its content constitutes a representative sample of the language skill,structures,etc.with which it is meant to be concerned.内容效度指测试是否考了考试大纲规定要考的,或者说考试的题目在多大程度上能代表它所要测量的目标。,Is the content of a test related to the objective or purpose of it?Are the test items representative?Is the content appropriate or suitable for the testees?,Construct validity,If a test has construct validity,it is capable of measuring certain specific characteristics in accordance with a theory of language behavior and learning.结构(构卷)效度指测试是否以有效的语言观(包括语言学习观和语言运用观)为依据。这里的结构并不是指试卷的结构或题目的编排,而是指整个考试的理论基础。,Empirical validity,This validity is obtained as a result of comparing the results of the test with the results of some criterion measure.实验(统计)效度是将考试结果与其它测量结果相比较而得来的。它又可分为共时效度和预测效度。,Concurrent validity,If the results of the test are compared with the results of some criterion measure such as:an existing test,known or believed to be valid and given;or the teachers ratings or any other such form of independent assessment givenat the same time,then results obtained by either of the above two methods are measures of the tests concurrent validity in respect of the particular criterion used.,In other words,concurrent validity is established when the test and the criterion are administered at about the same time.共时效度是将一次测试的结果同另一次同时或时间相近的测试的结果相比较,或同教师对学生的评估相比较而得出的系数。例如拿期末考试成绩与刚刚结束的四级考试成绩相比,假若得分情况相似,则说明期末测试有较高的共时效度。(前提:四级考试效度很高。),Predicative validity,If the results of the test are compared with the results of some criterion measure such as:the subsequent performance of the testees on a certain task measured by some valid test;or the teachers ratings or any other such form of independent assessment given later,then results obtained by either of these two methods are measures of the tests predicative validity in respect of the particular criterion used.,In other words,predicative validity concerns the degree to which a test can predict the testers future performance or success.预测效度涉及测试的预测能力,即测试结果到底在多大程度上能够预测出某些将来会发生的可能性,或者说考试是否具有预测学生未来表现或成绩的功能。,A Test is said to be reliable if it is consistentin its measurements.信度是指考试结果的可靠性和稳定性。例如拿一份卷子对同一组学生实施两次或多次测试,如果结果很一致,则说明该测试的信度较高。,Reliability,验证测试信度的方法,考后复考法(test/retest method)试题分半法(split-half method)平行试题法(parallel forms method)(刘润清:P211)(Heaton:P163),test/retest method,This method is to re-administer the same test after a lapse of time.It is often impracticable since certain students will benefit more than others by a familiarity with the type and format of the test.Moreover,in addition to changes in performance resulting from the memory factor,personal factors such as motivation and differential maturation will also account for differences in the performances of certain students.,split-half method,This method estimates a different kind of reliability from that estimated by test/re-test procedure.It is based on the principle that,if an accurate measuring instrument were broken into two equal parts,the measurements obtained with one part would correspond exactly to those obtained with the other.(Heaton:164),parallel forms method,This method is to administer parallel forms of the test to the same group.This assumes that two similar versions of a particular test can be constructed:such tests must be identical in the nature of their sampling,difficulty,length,rubrics,etc.only after a full statistical analysis of the tests and all the items contained in them can the tests safely be regarded as parallel.If the correlation between the two tests is high,then the tests can be termed reliable.,Factors affecting the reliability of a test:,the extent of the sample of material selected for testing;the administration of the test(Heaton:P162),影响考试信度的因素(刘润清:P214),题量题目性质题目区分度成绩分布题目难度评分是否客观考试的时间,Power/Difficulty,难度是指一套试题中每个题目的难易程度。分析一套试卷的质量如何,除了看其信度和效度这两个重要指标之外,还要研究试题的难度指数(index of difficulty/facility value),即试题的难易度。,难度值的计算公式:,题目的难度通常用P来表示,P值实际上指的是答对题目的比率。假设有10名考生,某道题有8人答对,那么该题的难度值为:,适用于主观性试题的公式,假设某写作题的满分为20分,所有考生在这道题上的得分的平均分为16分,则该题的难度值为:,正态分布图,(刘润清:P217),Discrimination,Discrimination of a test is its capability to discriminate among the different candidates and to reflect the differences in the performance of the individuals in the group.区分度指一个题目区分考生能力的程度。,计算题目区分度的方法,公式法点双列相关系数法双列相关系数法(刘:P221),Practicality,A good test is practical.It is within the means of financial limitations,time constraints,ease of administration,and scoring and interpretation.实用性是指试题是否便于使用以及实施起来是否可行。,Factors affecting practicality,the length of time available for the administration of the testthe answer sheet and the stationery usedthe test situationthe necessary equipmentthe presentation of the test paper(Heaton:167),Backwash effects,The term backwash(also sometimes referred to as washback)refers to the effects of a test on teaching and learning.If a test has good backwash effects,it will exert a good influence on the learning and teaching that takes place before the test.,Discussion,Whats the relationship among tests,measurement,and evaluation?According to J.B.Heaton,what are the four main approaches to testing?And what are their features?Consider any tests with which youre familiar.Assess each of them in terms of the various kinds of validity.,II.Stages of test construction,Deciding on test purpose,type,content,items,and frameworkWriting specifications for the testWriting and revising the testFurther considerations,1.Deciding on test purpose,type,content,items,and framework,Purpose and type of the test,According to different purposes,tests can be classified into several types:,Progress test 进步测试Achievement/Attainment test 成绩测试Proficiency test 水平测试Aptitude test 潜能测试Diagnostic test 诊断测试,Test content-1,Tim(2000):Establishing test content involves careful sampling from the domain of the test,that is,the set of tasks or the kinds of behaviors in the criterion setting,as informed by our understanding of the test construct.(Tim:P25),Test content-2,B.Bloom:six levels of educational objectives,知识理解应用分析综合评估 刘:P36,Items of the test,Subjective and objective itemsMultiple-choice items,true-false items,matching items,gap-filling items(cloze),dictation,composition and essays,interviewAdvantages and disadvantages of each(刘:P38,武:P25),Framework of the test,Language skills and language elementsRecognition and production Heaton:P10,To what extent should we concentrate on testing students ability to handle the elements of language and to what extent should we concentrate on testing the integrated skills?The answer to this question depends on both the level and the purpose of the test.初三、高三试卷总体设计,Recognition:Choose the correct answer and write A,B,C or D.Ive been standing here half an hour.A.since B.during C.while D.forProduction:Complete each blank with the correct word.Ive been standing here half an hour.,An example of the framework of test,2.Writing specifications for the test,What are test specifications,A tests specifications provide the official statement about what the test tests and how it tests it.The specifications are the blueprint to