What Do Consistency Estimates Tell Us about Reliability in Holistic Scoring ?

Doudja SLOUGUI

Authors

Doudja SLOUGUI Ecole Normale Supérieure de Constantine

Keywords:

Consistency Estimates, Reliability, Holistic Scoring

Abstract

Essay writing assessment is a largely used test in many examination types. Preferred fortheir “validity and authenticity” (Hamp-Lyons, 2003:163), direct writing tests are prevailing in entrance, placement examinations, as well as in continuous assessment. However, when

compared to indirect writing testing, their level of reliability is questioned and their scoring procedures incriminated. True scoring does not exist; errors stem from various sources: theraters, their training, the task (Huot, 1990); rendering essay marking doubtful, and raters’scoring inconsistent. This study reports on a large scale, high-stake writing proficiency test taken by 441 students. The essays were holistically scored on a 7-point scale by 16 raters. ThePearson correlation coefficient was used for assessing the degree of consistency between

raters. The coefficientwas calculated for each pair of judges in the 25 groups of students.

Results show positive correlation, but consistency in relationship has revealed some degree of variability between the paired samples. The range of correlations fell between .16 and .91. with the majority between .50 and .74. These findings raise issues about the factors that threaten consistency of scoring in writing tests.

Downloads

Download data is not yet available.

References

Bachman, L.F. and Palmer A.S. (1996) Language Testing in Practice : Designing and Developing Useful Language Tests. Oxford: Oxford University Press

Barkaoui K. (2010) Explaining ESL Essay Holistic Scores: A Multilevel Modeling Approach Language Testing 27(3)

Brown, T.L. Gavin (2009) The Reliability of Essay Scores: The Necessity of Rubrics and Moderation. In Tertiary Assessment and Higher Education Student Outcomes: Policy, Practice and ResearchEds:Luanna H. Meyer; Susan Davidson; Malcolm Rees ; Richard B. Fletcher and Patricia M. Johnston . Wellington, New Zealand : AkoAotearoa

Douglas, D. (2000) Assessing Language for Specific Purposes, Cambridge

Cambridge University Press

Excks,T ( 2012) Operational Rater Types in Writing Assessment: Linking Rater Cognition to Rater Behavior. Language Performance Assessment Quarterly, 9: 270–292.

Fei Wong FookFei, MohdSallehhudinAbd Aziz and ThangSiew Ming (2011).The Practice of ESL Writing Instructors in Assessing Writing Performance.Procedia Social and Behavioral Sciences 18 (2011) 1–5

Greenberg, L. Karen (1992) validity and reliability: Issues in the direct assessment of writing. Writing Program Administration Vol.16 Nos 1-2,

Hamp-Lyons,L (2003), Writing teachers as assessors of writing in Exploring The Dynamics of Second Language Writing Barbara Kroll (ed). Cambridge University Press, 162-190

Huot, B. (1990). The Literature of Direct Writing Assessment: Major Concerns and Prevailing Trends. Review of Educational Research, 60(2), 237–263. Retrieved from http://www.jstor.org/stable/1170611

Huot, B. (1996).Toward a New Theory of Writing Assessment CollegeComposition and Communication, Vol. 47, No. 4 (Dec., 1996), pp. 549-566 Published by: National Council of Teachers of English Stable URL: http://www.jstor.org/stable/358601Accessed: 30/09/2010 16:23

Huot, B and Peggy O'Neill ( 2007). Assessing Writing: The Introduction A Critical Sourcebook. Bedford/St. Martin's

http://casymposium.blogspot.com/2007/10/assessing-writing-introduction.html

Klapper (2006) Understanding and Developing good practice : Language Teaching in Higher Education . London: CILT

Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? LanguageTesting, 19, 246-276.

McNamara. T. F.(1996). Measuring second language performance London; New York: Longman

Shohamy, E. Gordon, C. and Kraemer, R. (1992) the effect of raters’background and training on the reliability of direct writing tests. Modern Language Journal 76(4), 513-521

Stemler, Steven E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation, 9(4). http://PAREonline.net/getvn.asp?v=9&n=4 .

Wang, P (2009) The Inter-Rater Reliability in Scoring Composition. English Language Teaching, 2 (3)

Weigle, C, Sarah S.C. (1994). Effects of training on raters of ESL compositions. LanguageTesting, 11, 197-223.

Weigle, C, Sarah (1998). Using FACETS to model rater training effects

at: Language Testing, 15/2/263 http://ltj.sagepub.com/content/15/2/263

Weigle, C, Sarah (2002). Assessing Writing. Cambridge University Press

What Do Consistency Estimates Tell Us about Reliability in Holistic Scoring ?

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

DGRSDT Category

Indexing and Abstracting

Keywords

Information

Language

Visitors

Latest publications