Abstract Validation is one of the core concepts in the field of language testing .In recent years,validity argument and assessment usage argument have provided new perspective and approach to language test validation .Based on the interpretative argument and with the inferences and assumptions,a scientific inquiry into score interpretation and use becomes the focus of validation research .According to assessment usage argument,evaluation is the first step in the whole validation process .Evaluation is based on the warrant that observations of performance on test tasks could provide observed scores reflective of targeted language abilities .And the assumption underlying this warrant is that rubrics for scoring responses are suitable to provide evidence for targeted language abilities .In order to obtain backing for the inference,researchers need to make sure that rubrics are developed,trialed and revised based on expert consensus .However,some researches point out that the present rubrics for oral performance assessment are not specific and detailed .This may result from the lack of empirical research,and particularly the research on raters' rating process and performance . Under the background of the popularizing of TEM4 Oral Test among Chinese universities and colleges,this research aims to investigate the validity of the rating scale of TEM4 Oral Test story retelling task and inquire whether and to what extent the present rubrics for scoring story retelling task are appropriate for providing evidence for the oral proficiency of the sophomore English majors in Chinese universities and colleges . Following the theoretical framework of validity argument and assessment usage argument and focusing on score interpretation,the researcher has analyzed qualitative and quantitative data obtained through observation of expert rating process and has attempted to find the answers to the following research questions :(1) To which conceptual categories do expert raters attend in conducting their evaluations of performance on the story retelling task ? (2) How do the expert raters characterize different levels of the students' story retelling performances ? Participants for this study were 10 expert raters from 2 Chinese universities .All of the raters have rich experience of English teaching in Chinese colleges and universities,and all of them have been the raters for other types of English oral tests .Retrospective verbal report and stimulated recall were applied to collect the qualitative data concerning raters' cognition in the rating process .In the retrospective verbal report,after listening to the audiotaped performance as a whole,the raters provided a verbal report on assessing .Afterwards,the raters listened to the tape the second time and may stop the record at any place and any time to provide the stimulated recalls .Based on both qualitative and quantitative data analysis,the study has found that the raters attended to four major conceptual categories when assessing students' story retelling performances : linguistic resources including grammar,vocabulary and expression,retelling content,fluency and pronunciation .It is suggested that fluency should be added as a category into the present story retelling rubric which only contains grammar and vocabulary,retelling content and pronunciation .At the same time,the study has found that within different levels,raters' attention was paid to different aspects of retelling performance in specific categories . Finally,through summarizing and subcategorizing,the study has made a contribution to a more detailed and operationalized rubrics of four levels which has modified the current rating scale for the story retelling task of TEM4 Oral Test and might promote English major teaching and learning in Chinese colleges and universities in terms of attaining test objectiveness and fairness .
|