Test-retest reliability study

Project details:

Institution: TEK Kampüs Okulları, a public school in Arsuz, Hatay, Turkey.

Participants: 306 students- 5th-10th graders (11-17 years old) which participated in all 3 sessions, and achieved a relevant result (above 30%),  918 level tests in total.

Dates: 3 assessment sessions in 2018- February 16, March 23, and April 27.

The test-retest reliability measurement or repeatability is a method for testing the stability and reliability of an assessment instrument over time.

In plain English, when the same students repeat the test more times, they should get the questions of the same difficulty and achieve similar results.

This is quantifiable by statistical methods with the reliability coefficient on a scale from 0 (no correlation) to 1 (best). A reliability coefficient of .70 or higher is considered  “acceptable” in most social science research situations. The alpha coefficient above .80 means that items have high consistency.

For the purpose of this analysis, we have used the standard English test results provided by TrackTest for one public school in Turkey.

English test settings:

TrackTest English Core test (random), i.e. each student during each session got a different set of questions and tasks.

Tests were taken on tablets in the proctored environment organized by PlusEd Turkey.


306 students took the English test in three assessment rounds month by month. Results showed Cronbach’s Alpha and Intraclass Correlation (ICC) 0.816 – highly reliable.

English proficiency levels:

Students were allowed to progress to a higher level in the next assessment. For instance, they successfully completed the A1 level test in the first assessment and took the A2 level test in the second assessment. This would usually cause a major problem when making a comparison across the levels. For this reason, the TrackTest system provides also the internal TrackTest Score (TTS)  which automatically recalculates the level test percentage score results in the universal TTS points. These TTS scores (0-1200 points) were used for this test-retest reliability analysis.

Level distribution per assessment round


We have taken into account also the usual biases in the test-retest-retest scenarios, e.g. time difference between tests during which students attended the English language tuition in their school as well as the novelty factor of the first test vs. repeated tests and carryover effect.

However, five weeks gap between the tests is still quite short for the significant improvement in the English language and the retests were provided with a different set of questions. Therefore there might be some degree of error but there was no sign of intervening factors that would compromise the overall test-retest reliability.

Two test reliability coefficients were calculated, a standard Cronbach’s Alpha (\alpha ) and Intraclass Correlation (ICC). Since the variance component equaled 0, the result for both was the same.

Test reliability results:

Cronbach’s Alpha: 0.816  (High)
Cronbach’s Alpha Based on Standardized Items:  0.817
N of Items: 3

Intraclass Correlation-ICC (Average Measures): 0.816
Two-way mixed model, type: Consistency
95% Confidence Interval (lower-upper bound): 0.777-.0849

Do you want to test your students?