angle-up angle-right angle-down angle-left close user menu open menu closed search globe bars phone store

TOEIC® Score Consistency

TOEIC® scores are consistent and reliable.

Evidence: The research in this section demonstrates how TOEIC Program Research helps to ensure that scores are not improperly influenced by aspects of the testing procedure that are unrelated to language ability. When examining score consistency or reliability, there are multiple aspects of the testing procedure that are considered, including:

  • test items (internal consistency)
  • test forms (equivalence)
  • test occasions or administrations (stability)
  • raters (inter- and intra-rater reliability)
  • Monitoring Score Change Patterns to Support TOEIC® Listening and Reading Test Quality

    In large-scale, high-stakes testing programs, such as the TOEIC program, some test takers take a test more than once over time. The score change patterns of these so-called "repeaters" can be analyzed to support the overall quality of the test (e.g., its reliability, validity, intended uses). This study examined the aforementioned score change patterns, with the goal of evaluating the reliability and validity of TOEIC® Listening and Reading test scores.

    Read more about Monitoring Score Change Patterns to Support TOEIC® Listening and Reading Test Quality

  • Cover of How ETS Scores the TOEIC Speaking and Writing Test Responses

    How ETS Scores the TOEIC® Speaking and Writing Test Responses

    Typically, human raters are used to score Speaking and Writing tests because of their ability to evaluate a broader range of language performance than automated systems. This paper describes how ETS ensures the reliability and consistency of scores by human raters for TOEIC® Speaking and Writing tests through training, certification, and systematic administrative and statistical monitoring procedures.

    Read more about How ETS Scores the TOEIC Speaking and Writing Test Responses

  • Cover of Monitoring TOEIC® Listening and Reading Test Performance Across Administrations Using Examinees' Background Information

    Monitoring TOEIC® Listening and Reading Test Performance across Administrations Using Examinees' Background Information

    The scoring process for the TOEIC Listening and Reading test includes monitoring procedures that help ensure that scores are consistent across different test forms and test administrations, and that skill interpretations are fair. This study explores the possibility of using information about test takers' backgrounds in order to enhance several types of monitoring procedures.

    Read more about Monitoring TOEIC Listening and Reading test performance across Administrations

  • Evaluating the Stability of Test Score Means for the TOEIC® Speaking and Writing Tests

    For educational tests, it is critical to maintain consistency of score scales and to understand the sources of variation in score means over time. This helps ensure that interpretations about test takers' abilities are comparable from one administration (or form) to another. Using statistical procedures, this study examined the consistency of reported scores for the TOEIC® Speaking and Writing tests.

    Read more Evaluating the Stability of Test Score Means for the TOEIC Speaking and Writing Tests

  • Cover of Comparison of Content, Item Statistics, and Test-Taker Performance on the Redesigned and Classic TOEIC® Listening and Reading Test

    Comparison of Content, Item Statistics, and Test Taker Performance on the Redesigned and Classic TOEIC® Listening and Reading Test

    This paper compares the content, reliability and difficulty of the classic and 2006 redesigned TOEIC Reading and Listening tests. Although the redesigned tests included slightly different item types to better reflect current models of English-language proficiency, the tests were judged to be similar across versions.

    Read more about Comparison of Content, Item Statistics, and Test-Taker Performance on the Redesigned and Classic TOEIC Listening and Reading Test

  • Statistical Analyses for the Expanded Item Formats of the TOEIC® Speaking Test

    Testing programs should periodically review their assessments to ensure that their test items or tasks are well-aligned with real-world activities. For this reason, to better support communicative language learning and to discourage the use of memorization and other test-taking strategies, ETS expanded the existing format of some items of the TOEIC® Speaking test in May 2015.

    Read more about Statistical Analyses for the Expanded Item Formats of the TOEIC Speaking Test

  • The Consistency of TOEIC® Speaking Scores Across Ratings and Tasks

    This study examines the consistency of TOEIC Speaking scores. The analysis uses a methodology based on generalizability theory, which allows researchers to examine the degree to which aspects of the testing procedure (i.e., raters, tasks) influence scores. The results contribute evidence to support claims that TOEIC Speaking scores are consistent.

    Read more about The Consistency of TOEIC Speaking Scores Across Ratings and Tasks

  • Cover of Monitoring Individual Rater Performance for the TOEIC® Speaking and Writing Tests

    Monitoring Individual Rater Performance for the TOEIC® Speaking and Writing Tests

    This paper describes procedures implemented on the TOEIC Speaking and Writing tests for monitoring individual rater performance and enhancing overall scoring quality. These multifaceted, carefully developed procedures help ensure that the potential for human error is kept to a minimum, thereby contributing to the TOEIC tests' scoring consistency.

    Read more about Monitoring Individual Rater Performance for the TOEIC Speaking and Writing Tests

  • Cover of Alternate Forms Test-Retest Reliability and Test Score Changes for the TOEIC® Speaking and Writing Tests

    Alternate Forms Test-Retest Reliability and Test Score Changes for the TOEIC® Speaking and Writing Tests

    The reliability or consistency of scores can be examined in a variety of ways, including the degree to which scores for the same test taker are consistent across different test forms (so-called "equivalent forms reliability") and different occasions of testing ("test-retest reliability"). This study examined the consistency of TOEIC Speaking and Writing scores across different test forms at different time intervals (e.g., 1–30 days, 31–60 days) and found that test scores had reasonably high equivalent form test-retest reliability.

    Read more about Alternate Forms Test-Retest Reliability and Test Score Changes

  • Cover of Statistical Analyses for the TOEIC® Speaking and Writing Pilot Study

    Statistical Analyses for the TOEIC® Speaking and Writing Pilot Study

    This paper reports the results of a pilot study that contributed to TOEIC Speaking and Writing test development. The analysis of the reliability of test scores found evidence of several types of score consistency, including inter-rater reliability (agreement of several raters on a score) and internal consistency (a measure based on correlation between items on the same test).

    Read more about Statistical Analyses for the TOEIC Speaking and Writing

  • Cover of Field Study Results for the Redesigned TOEIC® Listening and Reading Test

    Field Study Results for the Redesigned TOEIC® Listening and Reading Test

    This paper describes the results of a field study for the 2006 redesigned TOEIC Listening and Reading tests, which includes analyses of item and test difficulty, reliability and correlations between test sections with classic TOEIC Listening and Reading tests.

    Read more about Field Study Results for the Redesigned TOEIC Listening and Reading Test