Chapter 2 Reliability, Precision, and Errors of Measurement

2.1 Introduction

This chapter addresses the technical quality of operational test functioning with regard to precision and reliability. Part of the test validity argument is that scores must be consistent and precise enough to be useful for intended purposes. If scores are to be meaningful, tests should deliver the same results under repeated administrations to the same student or for students of the same ability. In addition, the range of certainty around the score should be small enough to support educational decisions. The reliability and precision of a test are examined through analysis of measurement error and other test properties in simulated and operational conditions. For example, the reliability of a test may be assessed in part by verifying that different test forms follow the same blueprint. In computer adaptive testing (CAT), one cannot expect the same set of items to be administered to the same examinee more than once. Consequently, reliability is inferred from internal test properties, including test length and the information provided by item parameters. Measurement precision is enhanced when the student receives items that are well matched, in terms of difficulty, to the overall performance level of the student. Measurement precision is also enhanced when the items a student receives work well together to measure the same general body of knowledge, skills, and abilities defined by the test blueprint. Smarter Balanced uses an adaptive model because adaptive tests are customized to each student in terms of the difficulty of the items. Smarter Balanced used item quality control procedures that ensure test items measure the knowledge, skills, and abilities specified in the test blueprint and work well together in this respect. The expected outcome of these and other test administration and item quality control procedures is high reliability and low measurement error.

For the 2021-22 administration, all statistics in this chapter are based on the full blueprint. Measurement bias from the simulation results produced by Cambium Assessment are provided, along with reliability, classification accuracy, and standard errors of measurement based on student data provided by Michigan, Montana, Nevada, and South Dakota. Statistics about the paper/pencil forms are based on the items on the forms, not the students who took the assessment in 2021-22.

2.2 Measurement Bias

Measurement bias is any systematic or non-random error that occurs in estimating a student’s achievement from the student’s scores on test items. Prior to the release of the 2021-22 item pool, simulation studies were carried out to ensure that the item pool, combined with the adaptive test administration algorithm, would produce satisfactory tests with regard to measurement bias and random measurement error as a function of student achievement, overall reliability, fulfillment of test blueprints, and item exposure.

Results for measurement bias with the full blueprint are provided in this section. Measurement bias is the one index of test performance that is clearly and preferentially assessed through simulation as opposed to the use of real data. With real data, true student achievement is unknown. In simulation, true student achievement can be assumed and used to generate item responses. The simulated item responses are used in turn to estimate achievement. Achievement estimates are then compared to the underlying assumed, true values of student achievement to assess whether the estimates contain systematic error (bias).

Simulations for the 2021-22 administration were carried out by Cambium Assessment. The simulations were performed for each grade within a subject area for the standard item pool (English) and for accommodation item pools of braille and Spanish for mathematics and braille for ELA/literacy. For the standard item pools, the number of simulees was 3,000 for grades 3-8 and 5,000 for grade 11. For the braille and Spanish pools, the number of simulees was 1,000 for grades 3-8 and 2,000 for grade 11. True student achievement values were sampled from a normal distribution for each grade and subject. The parameters for the normal distribution were based on students’ operational scores on the 2018–2019 Smarter Balanced summative tests.

Test events were created for the simulated examinees using the 2021-22 item pool. Estimated ability ( \(\hat{\theta}\) ) was calculated from the simulated tests using maximum likelihood estimation (MLE) as described in the Smarter Balanced Test Scoring Specifications (Smarter Balanced, 2023b).

Bias was computed as:

\[\begin{equation} bias = N^{-1}\sum_{i = 1}^{N} (\theta_{i} - \hat{\theta}_{i}) \tag{2.1} \end{equation}\]

and the error variance of the estimated bias is:

\[\begin{equation} ErrorVar(bias) = \frac{1}{N(N-1)}\sum_{i = 1}^{N} (\theta_{i} - \hat{\theta}_{i}-mean(\theta_{i}-\hat{\theta}_{i}))^{2} \tag{2.2} \end{equation}\]

where \(\theta_{i} - \hat{\theta}\) is the deviation score, and \(N\) denotes the number of simulees (\(N = 1000\) for all conditions). Statistical significance of the bias is tested using a z-test: \[\begin{equation} z = \frac{bias}{\sqrt{ErrorVar(bias)}} \tag{2.3} \end{equation}\]

Table 2.1 and Table 2.2 show for ELA/literacy and mathematics, respectively, the bias in estimates of student achievement based on the complete test assembled from the standard item pool and the accommodations pools included in the simulations. The standard error of bias is the denominator of the z-score in Equation (2.3). The p-value is the probability \(|Z| > |z|\) where \(Z\) is a standard normal variate and \(|z|\) is the absolute value of the \(z\) computed in Equation (2.3). Under the hypothesis of no bias, approximately 5% and 1% of the \(\theta_{i}\) will fall outside, respectively, 95% and 99% confidence intervals centered on \(\theta_{i}\).

Mean bias was generally very small in practical terms, exceeding .02 in absolute value in no cases for ELA/literacy and in only six cases for mathematics. Mean bias tended to be statistically significantly different from 0, but this was due to the large sample sizes used for the simulation. In virtually all cases, the percentage of simulated examinees whose estimated achievement score fell outside the confidence intervals centered on their true score was close to expected values of 5% for the 95% confidence interval and 1% for the 99% confidence interval. Plots of bias by estimated theta in the full simulation report show that positive and statistically significant mean bias was due to thetas being underestimated in regions of student achievement far below the lowest cut score (separating achievement levels 1 and 2). The same plots show that estimation bias is negligible near all cut scores in all cases.

Table 2.1: BIAS OF THE ESTIMATED PROFICIENCIES: ENGLISH LANGUAGE ARTS/LITERACY
Pool Grade Mean Bias SE (Bias) P value MSE 95% CI Miss Rate 99% CI Miss Rate
Standard 3 0.00 0.01 0.76 0.11 4.53% 0.70%
Standard 4 0.00 0.01 0.99 0.12 5.40% 1.10%
Standard 5 -0.01 0.01 0.30 0.13 5.24% 1.23%
Standard 6 0.01 0.01 0.17 0.13 5.60% 0.80%
Standard 7 0.01 0.01 0.18 0.15 5.03% 1.03%
Standard 8 0.00 0.01 0.89 0.16 4.60% 0.87%
Standard HS -0.01 0.01 0.09 0.19 5.22% 0.96%
Braille 3 0.00 0.01 0.69 0.12 4.40% 1.00%
Braille 4 0.01 0.01 0.37 0.12 4.50% 0.90%
Braille 5 -0.01 0.01 0.28 0.13 4.90% 1.50%
Braille 6 0.00 0.01 0.93 0.14 5.40% 0.70%
Braille 7 -0.02 0.01 0.08 0.16 4.80% 1.20%
Braille 8 0.01 0.40 0.57 0.17 5.50% 1.20%
Braille HS -0.01 0.01 0.17 0.21 5.50% 0.95%


Table 2.2: BIAS OF THE ESTIMATED PROFICIENCIES: MATHEMATICS
Pool Grade Mean Bias SE (Bias) P value MSE 95% CI Miss Rate 99% CI Miss Rate
Standard 3 0.00 0.00 0.3 0.07 3.97% 1.00%
Standard 4 0.01 0.00 0.27 0.06 4.27% 1.10%
Standard 5 0.01 0.01 0.15 0.10 5.30% 1.27%
Standard 6 0.01 0.01 0.24 0.12 4.67% 0.77%
Standard 7 0.02 0.01 < 0.01 0.14 4.87% 0.97%
Standard 8 0.03 0.01 < 0.01 0.17 3.97% 0.87%
Standard HS 0.04 0.01 < 0.01 0.18 4.72% 0.88%
Braille 3 0.00 0.01 0.6 0.08 4.00% 1.20%
Braille 4 0.00 0.01 0.61 0.09 5.01% 0.40%
Braille 5 0.02 0.01 0.12 0.11 5.40% 0.90%
Braille 6 0.02 0.01 0.19 0.15 4.51% 0.90%
Braille 7 0.04 0.01 < 0.01 0.16 4.80% 1.10%
Braille 8 -0.01 0.01 0.3 0.20 5.10% 0.90%
Braille HS 0.04 0.01 < 0.01 0.26 5.35% 1.05%
Spanish 3 0.01 0.01 0.29 0.08 5.01% 0.80%
Spanish 4 0.01 0.01 0.33 0.08 4.30% 0.40%
Spanish 5 0.01 0.01 0.34 0.10 4.50% 0.60%
Spanish 6 0.00 0.01 0.99 0.14 5.32% 1.20%
Spanish 7 0.01 0.01 0.37 0.14 4.70% 1.10%
Spanish 8 0.03 0.01 0.04 0.21 5.41% 1.00%
Spanish HS 0.03 0.01 < 0.01 0.21 4.86% 1.20%

2.3 Reliability

Reliability estimates reported in this section are derived from internal, IRT-based estimates of the measurement error in the test scores of examinees (MSE) and the observed variance of examinees’ test scores on the \(\theta\)-scale \((var(\hat{\theta}))\). The formula for the reliability estimate (\(\rho\)) is:

\[\begin{equation} \hat{\rho} = 1 - \frac{MSE}{var(\hat{\theta})}. \tag{2.4} \end{equation}\]

According to Smarter Balanced Test Scoring Specifications (Smarter Balanced, 2023b), estimates of measurement error are obtained from the parameter estimates of the items taken by the examinees. This is done by computing the test information for each examinee \(i\) as:

\[\begin{equation} \begin{split} I(\hat{\theta}_{i}) = \sum_{j=1}^{I}D^2a_{j}^2 (\frac{\sum_{l=1}^{m_{j}}l^2Exp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))} {1+\sum_{l=1}^{m_{j}}Exp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))} - \\ (\frac{\sum_{l=1}^{m_{j}}lExp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))} {1+\sum_{l=1}^{m_{j}}Exp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))})^2) \end{split} \tag{2.5} \end{equation}\]

where \(m_j\) is the maximum possible score point (starting from 0) for the \(j\)th item, and \(D\) is the scale factor, 1.7. Values of \(a_j\) and \(b_{jk}\) are item parameters for item \(j\) and score level \(k\). The test information is computed using only the items answered by the examinee. The measurement error (SEM) for examinee \(i\) is then computed as:

\[\begin{equation} SEM(\hat{\theta_i}) = \frac{1}{\sqrt{I(\hat{\theta_i})}}. \tag{2.6} \end{equation}\]

The upper bound of \(SEM(\hat{\theta_i})\) is set to 2.5. Any value larger than 2.5 is truncated at 2.5. The mean squared error for a group of \(N\) examinees is then:

\[\begin{equation} MSE = N^{-1}\sum_{i=1}^N SEM(\hat{\theta_i})^2 \tag{2.7} \end{equation}\]

and the variance of the achievement scores is: \[\begin{equation} var(\hat{\theta}) = N^{-1}\sum_{i=1}^N SEM(\hat{\theta_i} - \overline{\hat{\theta}})^2 \tag{2.8} \end{equation}\]

where \(\overline{\hat{\theta}}\) is the average of the \(\hat{\theta_i}\).

The measurement error for a group of examinees is typically reported as the square root of \(MSE\) and is denoted \(RMSE\). Measurement error is computed with Equation (2.6) and Equation (2.7) on a scale where achievement has a standard deviation close to 1 among students at a given grade. Measurement error reported in the tables of this section is transformed to the reporting scale by multiplying the theta-scale measurement error by \(a\), where \(a\) is the slope used to convert estimates of student achievement on the \(\theta\)-scale to the reporting scale. The transformation equations for converting estimates of student achievement on the \(\theta\)-scale to the reporting scale are given in Chapter 5.

2.3.1 General Population

Reliability estimates in this section are based on real data and the full blueprint. In mathematics, claims 2 and 4 are reported together as a single subscore, so there are only three reporting categories for mathematics, but four claims. Table 2.3 and Table 2.4 show the reliability of the observed total scores and subscores for ELA/literacy and mathematics. Reliability estimates are high for the total score in both subjects. Reliability coefficients are high for the claim 1 score in mathematics, moderately high for the claim 1 and claim 2 scores in ELA/literacy, and moderately high to moderate for the remainder of the claim scores in both subjects. The lowest reliability coefficient in either subject is .554, which is the reliability of the claim 3 score in the grade 8 mathematics assessment.

Table 2.3: ELA/LITERACY SUMMATIVE SCALE SCORE MARGINAL RELIABILITY ESTIMATES
Grade N Total score Claim 1 Claim 2 Claim 3 Claim 4
3 149,007 0.937 0.791 0.732 0.586 0.733
4 149,694 0.934 0.802 0.718 0.603 0.693
5 151,742 0.938 0.794 0.730 0.621 0.766
6 150,477 0.927 0.768 0.742 0.606 0.716
7 153,344 0.924 0.784 0.753 0.616 0.703
8 59,278 0.910 0.751 0.701 0.620 0.718
HS 9,164 0.910 0.766 0.749 0.592 0.702
Table 2.4: MATHEMATICS SUMMATIVE SCALE SCORE MARGINAL RELIABILITY ESTIMATES
Grade N Total score Claim 1 Claim 2/4 Claim 3
3 149,001 0.957 0.926 0.684 0.741
4 149,404 0.955 0.924 0.735 0.718
5 151,473 0.940 0.901 0.643 0.701
6 149,792 0.940 0.906 0.696 0.634
7 152,644 0.933 0.895 0.611 0.646
8 58,623 0.904 0.884 0.649 0.554
HS 9,164 0.933 0.891 0.690 0.697

2.3.2 Demographic Groups

Reliability estimates in this section are based on real data and the full blueprint. During the 2021-22 administration year, most students and schools are back to normal testing. However, schools responded differently to post-pandemic test administration. Many states maintained remote testing and some states switched to adjusted blueprint. consequently, demographic groups results presented below should not be considered representative of the entire student population. Table 2.5 and Table 2.6 show the reliability of the test for students of different racial groups in ELA/literacy and mathematics who tested in 2021-22. Table 2.7 and Table 2.8 show the reliability of the test for students who tested in 2021-22, grouped by demographics typically requiring accommodations or accessibility tools. The groups include English learners (EL) and students falling under Individuals with Disabilities Education Act (IDEA).

Because of the differences in average score across demographic groups and the relationship between measurement error and student achievement scores, which will be seen in the next section of this chapter, demographic groups with lower average scores tend to have lower reliability than the population as a whole. Nevertheless, the reliability coefficients for all demographic groups in these tables are moderately high to high.

Table 2.5: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY ETHNIC GROUP - ELA/LITERACY
Grade Group N Var MSE Rho
3 Total 149,007 7758 491 0.937
3 American Indian or Alaska Native 2,977 6553 721 0.890
3 Asian 5,201 7307 454 0.938
3 Black/African American 21,980 6099 514 0.916
3 Native Hawaiian or Pacific Islander 598 7279 521 0.928
3 Hispanic/Latino Ethnicity 23,956 6726 505 0.925
3 White 85,278 7065 476 0.933
4 Total 149,694 8557 564 0.934
4 American Indian or Alaska Native 3,072 7629 807 0.894
4 Asian 5,250 8217 535 0.935
4 Black/African American 21,630 6793 581 0.915
4 Native Hawaiian or Pacific Islander 668 6920 577 0.917
4 Hispanic/Latino Ethnicity 24,470 7437 573 0.923
4 White 85,827 7756 551 0.929
5 Total 151,742 8682 542 0.938
5 American Indian or Alaska Native 3,124 7764 807 0.896
5 Asian 5,199 8061 533 0.934
5 Black/African American 22,084 6736 511 0.924
5 Native Hawaiian or Pacific Islander 584 7751 538 0.931
5 Hispanic/Latino Ethnicity 25,158 7477 526 0.930
5 White 86,930 7992 545 0.932
6 Total 150,477 8380 610 0.927
6 American Indian or Alaska Native 3,033 7628 972 0.873
6 Asian 5,251 7803 563 0.928
6 Black/African American 21,610 6480 632 0.903
6 Native Hawaiian or Pacific Islander 620 6621 610 0.908
6 Hispanic/Latino Ethnicity 24,920 7315 620 0.915
6 White 86,735 7876 591 0.925
7 Total 153,344 9326 710 0.924
7 American Indian or Alaska Native 3,081 8973 1151 0.872
7 Asian 5,370 8284 651 0.921
7 Black/African American 21,564 7965 751 0.906
7 Native Hawaiian or Pacific Islander 617 7703 636 0.917
7 Hispanic/Latino Ethnicity 25,328 8280 675 0.919
7 White 89,322 8766 700 0.920
8 Total 59,278 9395 843 0.910
8 American Indian or Alaska Native 2,569 8164 1133 0.861
8 Asian 2,336 8474 734 0.913
8 Black/African American 4,668 8417 853 0.899
8 Native Hawaiian or Pacific Islander 602 8341 779 0.907
8 Hispanic/Latino Ethnicity 17,889 8218 794 0.903
8 White 27,629 8803 860 0.902
HS Total 9,164 11888 1069 0.910
HS American Indian or Alaska Native 747 10471 1162 0.889
HS Asian 166 15289 1105 0.928
HS Black/African American 250 12225 1115 0.909
HS Native Hawaiian or Pacific Islander 9 16208 1058 0.935
HS Hispanic/Latino Ethnicity 601 12989 1112 0.914
HS White 7,040 10506 1052 0.900


Table 2.6: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY ETHNIC GROUP - MATHEMATICS
Grade Group N Var MSE Rho
3 Total 149,001 7594 326 0.957
3 American Indian or Alaska Native 2,955 6493 494 0.924
3 Asian 5,213 7177 305 0.958
3 Black/African American 21,957 6324 410 0.935
3 Native Hawaiian or Pacific Islander 598 6889 304 0.956
3 Hispanic/Latino Ethnicity 23,968 6558 329 0.950
3 White 85,303 6479 300 0.954
4 Total 149,404 7544 341 0.955
4 American Indian or Alaska Native 3,050 6608 514 0.922
4 Asian 5,244 7486 286 0.962
4 Black/African American 21,535 6079 468 0.923
4 Native Hawaiian or Pacific Islander 668 6803 319 0.953
4 Hispanic/Latino Ethnicity 24,426 6870 345 0.950
4 White 85,726 6279 306 0.951
5 Total 151,473 8518 512 0.940
5 American Indian or Alaska Native 3,103 6543 797 0.878
5 Asian 5,197 8315 388 0.953
5 Black/African American 22,015 6347 769 0.879
5 Native Hawaiian or Pacific Islander 582 7586 506 0.933
5 Hispanic/Latino Ethnicity 25,109 7330 563 0.923
5 White 86,831 7470 430 0.942
6 Total 149,792 10188 610 0.940
6 American Indian or Alaska Native 3,015 9604 1117 0.884
6 Asian 5,236 9733 434 0.955
6 Black/African American 21,467 8699 884 0.898
6 Native Hawaiian or Pacific Islander 616 7132 576 0.919
6 Hispanic/Latino Ethnicity 24,796 8185 651 0.920
6 White 86,401 8839 523 0.941
7 Total 152,644 11178 745 0.933
7 American Indian or Alaska Native 3,035 9193 1206 0.869
7 Asian 5,364 10536 490 0.953
7 Black/African American 21,447 9285 1109 0.881
7 Native Hawaiian or Pacific Islander 609 8064 738 0.909
7 Hispanic/Latino Ethnicity 25,210 9334 833 0.911
7 White 88,973 10127 628 0.938
8 Total 58,623 10306 991 0.904
8 American Indian or Alaska Native 2,525 9365 1411 0.849
8 Asian 2,327 9087 700 0.923
8 Black/African American 4,597 8551 1274 0.851
8 Native Hawaiian or Pacific Islander 597 8819 1010 0.885
8 Hispanic/Latino Ethnicity 17,761 8426 1106 0.869
8 White 27,287 10214 861 0.916
HS Total 9,164 13402 902 0.933
HS American Indian or Alaska Native 742 10099 1425 0.859
HS Asian 165 17130 855 0.950
HS Black/African American 248 11675 1330 0.886
HS Native Hawaiian or Pacific Islander 9 9885 1002 0.899
HS Hispanic/Latino Ethnicity 616 12661 1199 0.905
HS White 7,033 11765 805 0.932


Table 2.7: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY GROUP - ELA/LITERACY
Grade Group N Var MSE Rho
3 Total 149,007 7758 491 0.937
3 EL Status 14,543 5809 503 0.913
3 IDEA Indicator 19,507 6280 544 0.913
3 Section 504 Status 547 7783 686 0.912
3 Economic Disadvantage Status 75,588 6713 481 0.928
4 Total 149,694 8557 564 0.934
4 EL Status 14,562 6256 572 0.909
4 IDEA Indicator 19,723 7034 641 0.909
4 Section 504 Status 652 8379 800 0.905
4 Economic Disadvantage Status 74,694 7392 551 0.925
5 Total 151,742 8682 542 0.938
5 EL Status 12,693 5073 526 0.896
5 IDEA Indicator 19,513 6382 588 0.908
5 Section 504 Status 739 8788 794 0.910
5 Economic Disadvantage Status 74,661 7445 508 0.932
6 Total 150,477 8380 610 0.927
6 EL Status 9,809 4228 674 0.841
6 IDEA Indicator 18,424 5679 718 0.874
6 Section 504 Status 2,522 7368 857 0.884
6 Economic Disadvantage Status 73,080 7161 605 0.915
7 Total 153,344 9326 710 0.924
7 EL Status 9,446 5180 818 0.842
7 IDEA Indicator 17,708 6693 865 0.871
7 Section 504 Status 2,708 8348 895 0.893
7 Economic Disadvantage Status 72,391 8281 721 0.913
8 Total 59,278 9395 843 0.910
8 EL Status 4,966 4879 1020 0.791
8 IDEA Indicator 6,549 6437 1079 0.832
8 Section 504 Status 2,783 8884 1044 0.882
8 Economic Disadvantage Status 24,039 8470 863 0.898
HS Total 9,164 11888 1069 0.910
HS EL Status 206 6041 1380 0.771
HS IDEA Indicator 837 9315 1254 0.865
HS Section 504 Status 385 13770 1088 0.921
HS Economic Disadvantage Status 2,628 11921 1091 0.908


Table 2.8: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY GROUP - MATHEMATICS
Grade Group N Var MSE Rho
3 Total 149,001 7594 326 0.957
3 EL Status 14,582 6291 348 0.945
3 IDEA Indicator 19,527 7771 453 0.942
3 Section 504 Status 551 7525 427 0.943
3 Economic Disadvantage Status 75,556 6780 350 0.948
4 Total 149,404 7544 341 0.955
4 EL Status 14,555 5990 380 0.937
4 IDEA Indicator 19,637 7231 513 0.929
4 Section 504 Status 660 7660 407 0.947
4 Economic Disadvantage Status 74,487 6579 381 0.942
5 Total 151,473 8518 512 0.940
5 EL Status 12,714 5356 673 0.874
5 IDEA Indicator 19,485 6861 825 0.880
5 Section 504 Status 739 7389 631 0.915
5 Economic Disadvantage Status 74,481 7247 613 0.915
6 Total 149,792 10188 610 0.940
6 EL Status 9,793 5988 909 0.848
6 IDEA Indicator 18,317 8618 1107 0.872
6 Section 504 Status 916 9336 826 0.911
6 Economic Disadvantage Status 72,696 9137 723 0.921
7 Total 152,644 11178 745 0.933
7 EL Status 9,435 6580 1204 0.817
7 IDEA Indicator 17,593 8395 1390 0.834
7 Section 504 Status 957 9998 902 0.910
7 Economic Disadvantage Status 72,004 9934 915 0.908
8 Total 58,623 10306 991 0.904
8 EL Status 4,941 5260 1559 0.704
8 IDEA Indicator 6,468 7354 1592 0.784
8 Section 504 Status 1,080 11102 1098 0.901
8 Economic Disadvantage Status 23,750 9065 1154 0.873
HS Total 9,164 13402 902 0.933
HS EL Status 215 7128 2034 0.715
HS IDEA Indicator 835 8804 1584 0.820
HS Section 504 Status 386 16345 936 0.943
HS Economic Disadvantage Status 2,629 12390 1092 0.912

2.3.3 Paper/Pencil Tests

Smarter Balanced supports fixed-form paper/pencil tests adherent to the full blueprint for use in a variety of situations, including schools that lack computer capacity and to address potential religious concerns associated with using technology for assessments. Scores on the paper/pencil tests are on the same reporting scale that is used for the online assessments. The forms used in the 2021-22 administration are collectively (for all grades) referred to as Form 5.

Table 2.9 and Table 2.10 show, for ELA/literacy and mathematics, respectively, statistical information pertaining to the items on Form 5 and to the measurement precision of the form. MSE estimates for the paper/pencil forms were based on Equation (2.5) through Equation (2.7), except that quadrature points and weights over a hypothetical theta distribution were used instead of observed scores (theta_hats). The hypothetical true score distribution used for quadrature was the student distribution from the 2014–2015 operational administration. Reliability was then computed as in Equation (2.4) with the observed-score variance equal to the MSE plus the variance of the hypothetical true score distribution. Reliability was better for the full test than for subscales and is inversely related to the SEM.

Table 2.9: RELIABILITY OF PAPER PENCIL TESTS, FORM 5 ENGLISH LANGUAGE ARTS/LITERACY
Grade Nitems Rho SEM Avg. b Avg. a C1 Rho C1 SEM C2 Rho C2 SEM C3 Rho C3 SEM C4 Rho C4 SEM
3 41 0.916 0.306 -0.734 0.800 0.806 0.499 0.720 0.633 0.619 0.796 0.695 0.672
4 41 0.907 0.343 -0.115 0.682 0.768 0.590 0.705 0.693 0.633 0.817 0.691 0.716
5 41 0.918 0.324 0.275 0.709 0.741 0.641 0.777 0.581 0.634 0.823 0.725 0.668
6 40 0.913 0.328 0.804 0.708 0.689 0.715 0.778 0.568 0.662 0.761 0.628 0.819
7 39 0.918 0.334 0.839 0.689 0.766 0.617 0.779 0.595 0.666 0.791 0.695 0.740
8 43 0.917 0.332 1.161 0.664 0.780 0.586 0.769 0.606 0.645 0.819 0.629 0.848
11 42 0.930 0.346 1.228 0.670 0.820 0.590 0.780 0.670 0.704 0.818 0.730 0.766


Table 2.10: RELIABILITY OF PAPER PENCIL TEST, FORM 5 MATHEMATICS
Grade Nitems Rho SEM Avg. b Avg. a C1 Rho C1 SEM C2&4 Rho C2&4 SEM C3 Rho C3 SEM
3 40 0.925 0.284 -0.900 0.898 0.842 0.433 0.594 0.826 0.747 0.582
4 40 0.924 0.292 -0.282 0.876 0.848 0.431 0.570 0.886 0.747 0.592
5 39 0.914 0.345 0.166 0.843 0.846 0.479 0.453 1.237 0.699 0.738
6 39 0.908 0.406 0.659 0.788 0.816 0.606 0.645 0.945 0.643 0.949
7 40 0.899 0.455 1.131 0.713 0.812 0.653 0.564 1.193 0.692 0.906
8 39 0.907 0.465 1.267 0.646 0.848 0.614 0.440 1.637 0.647 1.071
HS 41 0.914 0.478 0.949 0.588 0.851 0.651 0.460 1.688 0.760 0.875

2.4 Classification Accuracy

Information on classification accuracy is based on actual test results from the 2021-22 administration. Classification accuracy is a measure of how accurately test scores or subscores place students into reporting category levels. The likelihood of inaccurate placement depends on the amount of measurement error associated with scores, especially those nearest cut points, and on the distribution of student achievement. For this report, classification accuracy was calculated in the following manner. For each examinee, analysts used the estimated scale score and its standard error of measurement to obtain a normal approximation of the likelihood function over the range of scale scores. The normal approximation took the scale score estimate as its mean and the standard error of measurement as its standard deviation. The proportion of the area under the curve within each level was then calculated.

Figure 2.1 illustrates the approach for one examinee in grade 11 mathematics. In this example, the examinee’s overall scale score is 2606 (placing this student in level 2, based on the cut scores for this grade level), with a standard error of measurement of 31 points. Accordingly, a normal distribution with a mean of 2606 and a standard deviation of 31 was used to approximate the likelihood of the examinee’s true level, based on the observed test performance. The area under the curve was computed within each score range in order to estimate the probability that the examinee’s true score falls within that level (the red vertical lines identify the cut scores). For the student in Figure 2.1, the estimated probabilities were 2.1% for level 1, 74.0% for level 2, 23.9% for level 3, and 0.0% for level 4. Since the student’s assigned level was level 2, there is an estimated 74% chance the student was correctly classified and a 26% (2.1% + 23.9% + 0.0%) chance the student was misclassified.

Illustrative Example of a Normal Distribution Used to Calculate Classification Accuracy

Figure 2.1: Illustrative Example of a Normal Distribution Used to Calculate Classification Accuracy

The same procedure was then applied to all students within the sample. Results are shown for 10 cases in Table 2.11 (student 6 is the case illustrated in Figure 2.1).

Table 2.11: ILLUSTRATIVE EXAMPLE OF CLASSIFICATION ACCURACY CALCULATION RESULTS
Student SS SEM Level P(L1) P(L2) P(L3) P(L4)
1 2751 23 4 0.000 0.000 0.076 0.924
2 2375 66 1 0.995 0.005 0.000 0.000
3 2482 42 1 0.927 0.073 0.000 0.000
4 2529 37 1 0.647 0.349 0.004 0.000
5 2524 36 1 0.701 0.297 0.002 0.000
6 2606 31 2 0.021 0.740 0.239 0.000
7 2474 42 1 0.950 0.050 0.000 0.000
8 2657 26 3 0.000 0.132 0.858 0.009
9 2600 31 2 0.033 0.784 0.183 0.000
10 2672 23 3 0.000 0.028 0.949 0.023

Table 2.12 presents a hypothetical set of results for the overall score and for a claim score (claim 3) for a population of students. The number (N) and proportion (P) of students classified into each achievement level is shown in the first three columns. These are counts and proportions of “observed” classifications in the population. Students are classified into the four achievement levels by their overall score. By claim scores, students are classified as “below,” “near,” or “above” standard, where the standard is the level 3 cut score. Rules for classifying students by their claim scores are detailed in Chapter 7.

The next four columns (“Freq L1,” etc.) show the number of students by “true level” among students at a given “observed level.” The last four columns convert the frequencies by true level into proportions. The sum of proportions in the last four columns of the “Overall” section of the table equals 1.0. Likewise, the sum of proportions in the last four columns of the “Claim 3” section of the table equals 1.0. For the overall test, the proportions of correct classifications for this hypothetical example are .404, .180, .145, and .098 for levels 1-4, respectively.

Table 2.12: EXAMPLE OF CROSS-CLASSIFYING TRUE ACHIEVEMENT LEVEL BY OBSERVED ACHIEVEMENT LEVEL
Score Observed Level N P Freq L1 Freq L2 Freq L3 Freq L4 Prop L1 Prop L2 Prop L3 Prop L4
Overall Level 1 251,896 0.451 225,454 26,172 263 8 0.404 0.047 0.000 0.000
Overall Level 2 141,256 0.253 21,800 100,364 19,080 11 0.039 0.180 0.034 0.000
Overall Level 3 104,125 0.186 161 14,223 81,089 8,652 0.000 0.025 0.145 0.015
Overall Level 4 61,276 0.110 47 29 6,452 54,748 0.000 0.000 0.012 0.098
Claim 3 Below Standard 167,810 0.300 143,536 18,323 4,961 990 0.257 0.033 0.009 0.002
Claim 3 Near Standard 309,550 0.554 93,364 102,133 89,696 24,357 0.167 0.183 0.161 0.044
Claim 3 Above Standard 81,193 0.145 94 1,214 18,949 60,936 0.000 0.002 0.034 0.109

For claim scores, correct “below” classifications are represented in cells corresponding to the “below standard” row and the levels 1 and 2 columns. Both levels 1 and 2 are below the level 3 cut score, which is the standard. Similarly, correct “above” standard classifications are represented in cells corresponding to the “above standard” row and the levels 3 and 4 columns. Correct classifications for “near” standard are not computed. There is no absolute criterion or scale score range, such as is defined by cut scores, for determining whether a student is truly at or near the standard. That is, the standard (level 3 cut score) clearly defines whether a student is above or below standard, but there is no range centered on the standard for determining whether a student is “near.”

Table 2.13 shows more specifically how the proportion of correct classifications is computed for classifications based on students’ overall and claim scores. For each type of score (overall and claim), the proportion of correct classifications is computed overall and conditionally on each observed classification (except for the “near standard” claim score classification). The conditional proportion correct is the proportion correct within a row divided by the total proportion within a row. For the overall score, the overall proportion correct is the sum of the proportions correct within the overall table section.

Table 2.13: EXAMPLE OF CORRECT CLASSIFICATION RATES
Score Observed Level P Prop L1 Prop L2 Prop L3 Prop L4 Accuracy by level Accuracy overall
Overall Level 1 0.451 0.404 0.047 0.000 0.000 .404/.451=.895 (.404+.180+.145+.098)/1.000=.827
Overall Level 2 0.253 0.039 0.180 0.034 0.000 .180/.253=.711 (.404+.180+.145+.098)/1.000=.827
Overall Level 3 0.186 0.000 0.025 0.145 0.015 .145/.186=.779 (.404+.180+.145+.098)/1.000=.827
Overall Level 4 0.110 0.000 0.000 0.012 0.098 .098/.110=.893 (.404+.180+.145+.098)/1.000=.827
Claim 3 Below Standard 0.300 0.257 0.033 0.009 0.002 (.257+.033)/.300=.965 (.257+.033+.034+.109)/(.300+.145)=.971
Claim 3 Near Standard 0.554 0.167 0.183 0.161 0.044 NA (.257+.033+.034+.109)/(.300+.145)=.971
Claim 3 Above Standard 0.145 0.000 0.002 0.034 0.109 (.034+.109)/.145=.984 (.257+.033+.034+.109)/(.300+.145)=.971

For the claim score, the overall classification accuracy rate is based only on students whose observed achievement is “below standard” or “above standard.” That is, the overall proportion correct for classifications by claim scores is the sum of the proportions correct in the claim section of the table, divided by the sum of all of the proportions in the “above standard” and “below standard” rows.

The following two sections show classification accuracy statistics for ELA/literacy and mathematics. There are seven tables in each section—one for each grade 3-8 and high school (HS). The statistics in these tables were computed as described above.

2.4.1 English Language Arts/Literacy

Results in this section are based on real data from students who took the full blueprint. Table 2.14 through Table 2.20 show ELA/literacy classification accuracy for each grade 3-8 and high school (HS). Section 2.4 explains how the statistics in these tables were computed. Classification accuracy for each category was high to moderately high for all ELA/literacy grades.

Table 2.14: GRADE 3 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 49,672 0.333 0.305 0.028 0 0 0.916 0.824
Overall Level 2 36,562 0.245 0.031 0.183 0.031 0 0.747 0.824
Overall Level 3 32,087 0.215 0 0.032 0.155 0.029 0.718 0.824
Overall Level 4 30,686 0.206 0 0 0.025 0.181 0.877 0.824
Claim 1 Below 19,628 0.351 0.275 0.07 0.006 0 0.984 0.979
Claim 1 Near 23,376 0.418 0.048 0.173 0.153 0.044 0.979
Claim 1 Above 12,925 0.231 0 0.006 0.056 0.169 0.975 0.979
Claim 2 Below 17,148 0.382 0.274 0.087 0.018 0.003 0.945 0.945
Claim 2 Near 18,007 0.401 0.053 0.149 0.139 0.059 0.945
Claim 2 Above 9,790 0.218 0.001 0.011 0.056 0.15 0.944 0.945
Claim 3 Below 5,095 0.091 0.079 0.01 0.002 0 0.979 0.935
Claim 3 Near 39,054 0.698 0.242 0.218 0.157 0.082 0.935
Claim 3 Above 11,781 0.211 0.003 0.02 0.057 0.131 0.891 0.935
Claim 4 Below 14,945 0.333 0.265 0.061 0.006 0 0.98 0.976
Claim 4 Near 20,878 0.465 0.062 0.181 0.164 0.057 0.976
Claim 4 Above 9,122 0.203 0 0.005 0.042 0.155 0.972 0.976
Total: All Students 149,007 1.000
Table 2.15: GRADE 4 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 54,696 0.365 0.336 0.029 0 0 0.919 0.818
Overall Level 2 29,370 0.196 0.031 0.135 0.03 0 0.689 0.818
Overall Level 3 32,779 0.219 0 0.032 0.156 0.031 0.711 0.818
Overall Level 4 32,849 0.219 0 0 0.028 0.191 0.872 0.818
Claim 1 Below 19,472 0.346 0.292 0.05 0.005 0 0.986 0.984
Claim 1 Near 24,276 0.432 0.06 0.151 0.167 0.053 0.984
Claim 1 Above 12,472 0.222 0 0.004 0.047 0.171 0.982 0.984
Claim 2 Below 16,464 0.364 0.284 0.06 0.018 0.002 0.946 0.947
Claim 2 Near 19,616 0.434 0.071 0.134 0.151 0.078 0.947
Claim 2 Above 9,142 0.202 0.001 0.01 0.048 0.144 0.948 0.947
Claim 3 Below 11,049 0.197 0.17 0.021 0.005 0.001 0.973 0.942
Claim 3 Near 32,972 0.586 0.178 0.168 0.159 0.081 0.942
Claim 3 Above 12,201 0.217 0.004 0.015 0.055 0.142 0.911 0.942
Claim 4 Below 14,704 0.325 0.259 0.054 0.011 0.001 0.964 0.966
Claim 4 Near 20,942 0.463 0.097 0.143 0.157 0.066 0.966
Claim 4 Above 9,576 0.212 0.001 0.006 0.049 0.156 0.968 0.966
Total: All Students 149,694 1.000
Table 2.16: GRADE 5 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 52,559 0.346 0.318 0.028 0 0 0.919 0.825
Overall Level 2 31,558 0.208 0.03 0.149 0.029 0 0.717 0.825
Overall Level 3 40,982 0.270 0 0.031 0.209 0.03 0.775 0.825
Overall Level 4 26,643 0.176 0 0 0.027 0.149 0.846 0.825
Claim 1 Below 18,732 0.330 0.268 0.056 0.006 0 0.983 0.981
Claim 1 Near 24,218 0.427 0.052 0.152 0.189 0.033 0.981
Claim 1 Above 13,782 0.243 0 0.005 0.084 0.154 0.979 0.981
Claim 2 Below 14,528 0.319 0.248 0.056 0.015 0.001 0.951 0.949
Claim 2 Near 21,039 0.462 0.074 0.147 0.187 0.054 0.949
Claim 2 Above 9,938 0.218 0.001 0.011 0.077 0.129 0.946 0.949
Claim 3 Below 10,851 0.191 0.163 0.023 0.005 0 0.974 0.945
Claim 3 Near 33,211 0.585 0.154 0.174 0.194 0.063 0.945
Claim 3 Above 12,672 0.223 0.003 0.016 0.08 0.124 0.916 0.945
Claim 4 Below 6,810 0.150 0.134 0.014 0.002 0 0.989 0.985
Claim 4 Near 28,872 0.634 0.19 0.195 0.209 0.041 0.985
Claim 4 Above 9,823 0.216 0 0.004 0.069 0.143 0.982 0.985
Total: All Students 151,742 1.000
Table 2.17: GRADE 6 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 50,966 0.339 0.306 0.033 0 0 0.902 0.825
Overall Level 2 39,845 0.265 0.034 0.199 0.031 0 0.753 0.825
Overall Level 3 42,094 0.280 0 0.032 0.223 0.025 0.797 0.825
Overall Level 4 17,572 0.117 0 0 0.02 0.097 0.832 0.825
Claim 1 Below 18,057 0.321 0.24 0.074 0.006 0 0.981 0.977
Claim 1 Near 27,096 0.481 0.055 0.197 0.202 0.028 0.977
Claim 1 Above 11,145 0.198 0 0.005 0.08 0.112 0.973 0.977
Claim 2 Below 16,313 0.359 0.254 0.09 0.014 0 0.96 0.955
Claim 2 Near 20,638 0.454 0.046 0.179 0.195 0.034 0.955
Claim 2 Above 8,541 0.188 0 0.009 0.076 0.102 0.95 0.955
Claim 3 Below 14,007 0.249 0.189 0.051 0.008 0 0.967 0.94
Claim 3 Near 32,110 0.570 0.104 0.211 0.204 0.052 0.94
Claim 3 Above 10,188 0.181 0.002 0.014 0.077 0.088 0.913 0.94
Claim 4 Below 12,691 0.279 0.217 0.057 0.004 0 0.985 0.963
Claim 4 Near 22,132 0.487 0.082 0.207 0.175 0.022 0.963
Claim 4 Above 10,669 0.235 0.001 0.013 0.107 0.114 0.942 0.963
Total: All Students 150,477 1.000
Table 2.18: GRADE 7 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 48,264 0.315 0.284 0.031 0 0 0.902 0.819
Overall Level 2 40,562 0.265 0.034 0.196 0.035 0 0.739 0.819
Overall Level 3 47,414 0.309 0 0.036 0.247 0.026 0.8 0.819
Overall Level 4 17,104 0.112 0 0 0.019 0.092 0.828 0.819
Claim 1 Below 16,086 0.281 0.209 0.066 0.006 0 0.98 0.978
Claim 1 Near 28,059 0.490 0.045 0.185 0.238 0.022 0.978
Claim 1 Above 13,135 0.229 0 0.005 0.103 0.121 0.977 0.978
Claim 2 Below 10,087 0.219 0.171 0.042 0.006 0 0.972 0.962
Claim 2 Near 23,609 0.512 0.078 0.203 0.211 0.02 0.962
Claim 2 Above 12,380 0.269 0.001 0.012 0.132 0.124 0.952 0.962
Claim 3 Below 14,379 0.251 0.175 0.061 0.015 0 0.938 0.946
Claim 3 Near 34,297 0.599 0.079 0.189 0.271 0.059 0.946
Claim 3 Above 8,617 0.150 0 0.007 0.059 0.084 0.954 0.946
Claim 4 Below 10,178 0.221 0.165 0.049 0.006 0 0.972 0.968
Claim 4 Near 24,471 0.531 0.084 0.2 0.227 0.02 0.968
Claim 4 Above 11,427 0.248 0 0.008 0.115 0.124 0.964 0.968
Total: All Students 153,344 1.000
Table 2.19: GRADE 8 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 16,305 0.275 0.241 0.034 0 0 0.877 0.803
Overall Level 2 16,571 0.280 0.037 0.205 0.038 0 0.732 0.803
Overall Level 3 19,582 0.330 0 0.04 0.263 0.028 0.795 0.803
Overall Level 4 6,820 0.115 0 0 0.021 0.094 0.819 0.803
Claim 1 Below 17,944 0.303 0.224 0.074 0.005 0 0.982 0.978
Claim 1 Near 28,482 0.481 0.053 0.199 0.214 0.015 0.978
Claim 1 Above 12,756 0.216 0 0.006 0.103 0.107 0.973 0.978
Claim 2 Below 16,554 0.347 0.228 0.099 0.02 0 0.941 0.951
Claim 2 Near 23,497 0.493 0.05 0.176 0.234 0.034 0.951
Claim 2 Above 7,617 0.160 0 0.006 0.067 0.086 0.961 0.951
Claim 3 Below 15,326 0.259 0.185 0.063 0.01 0 0.959 0.943
Claim 3 Near 32,932 0.556 0.092 0.202 0.225 0.038 0.943
Claim 3 Above 10,937 0.185 0.001 0.013 0.087 0.084 0.927 0.943
Claim 4 Below 9,425 0.198 0.162 0.034 0.002 0 0.989 0.974
Claim 4 Near 27,917 0.586 0.115 0.239 0.213 0.019 0.974
Claim 4 Above 10,326 0.217 0 0.008 0.107 0.101 0.96 0.974
Total: All Students 59,278 1.000
Table 2.20: HIGH SCHOOL ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 1,470 0.160 0.141 0.019 0 0 0.881 0.794
Overall Level 2 1,928 0.210 0.024 0.153 0.034 0 0.725 0.794
Overall Level 3 3,220 0.351 0 0.04 0.265 0.046 0.753 0.794
Overall Level 4 2,546 0.278 0 0 0.042 0.235 0.847 0.794
Claim 1 Below 1,852 0.202 0.136 0.059 0.007 0 0.964 0.975
Claim 1 Near 4,228 0.461 0.029 0.148 0.233 0.051 0.975
Claim 1 Above 3,082 0.336 0 0.005 0.1 0.231 0.986 0.975
Claim 2 Below 1,564 0.171 0.121 0.042 0.008 0 0.952 0.961
Claim 2 Near 4,426 0.483 0.044 0.16 0.217 0.062 0.961
Claim 2 Above 3,172 0.346 0 0.01 0.116 0.22 0.97 0.961
Claim 3 Below 1,278 0.139 0.098 0.032 0.008 0.001 0.937 0.952
Claim 3 Near 5,793 0.632 0.067 0.172 0.265 0.128 0.952
Claim 3 Above 2,091 0.228 0 0.007 0.067 0.153 0.967 0.952
Claim 4 Below 1,487 0.162 0.12 0.038 0.004 0 0.976 0.978
Claim 4 Near 4,617 0.504 0.045 0.167 0.234 0.057 0.978
Claim 4 Above 3,058 0.334 0 0.006 0.103 0.225 0.981 0.978
Total: All Students 9,164 1.000

2.4.2 Mathematics

Results in this section are based on real data from students who took the full blueprint. Table 2.21 through Table 2.27 show the classification accuracy of the mathematics assessment for each grade 3-8 and high school (HS). Section 2.4 explains how the statistics in these tables were computed. Classification accuracy for each category was high to moderately high for all mathematics grades.

Table 2.21: GRADE 3 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 51,368 0.345 0.32 0.025 0 0 0.928 0.856
Overall Level 2 34,685 0.233 0.028 0.179 0.027 0 0.768 0.856
Overall Level 3 37,593 0.252 0 0.027 0.205 0.02 0.814 0.856
Overall Level 4 25,355 0.170 0 0 0.018 0.152 0.893 0.856
Claim 1 Below 23,095 0.414 0.291 0.089 0.03 0.004 0.918 0.902
Claim 1 Near 16,225 0.291 0.046 0.109 0.114 0.021 0.902
Claim 1 Above 16,512 0.296 0.009 0.025 0.109 0.153 0.887 0.902
Claim 2/4 Below 5,480 0.256 0.204 0.047 0.005 0 0.982 0.977
Claim 2/4 Near 10,211 0.477 0.066 0.187 0.194 0.03 0.977
Claim 2/4 Above 5,724 0.267 0 0.007 0.094 0.166 0.973 0.977
Claim 3 Below 11,366 0.204 0.167 0.032 0.005 0 0.976 0.944
Claim 3 Near 30,547 0.547 0.175 0.173 0.165 0.035 0.944
Claim 3 Above 13,919 0.249 0.004 0.018 0.084 0.144 0.912 0.944
Total: All Students 149,001 1.000
Table 2.22: GRADE 4 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 47,310 0.317 0.29 0.026 0 0 0.917 0.86
Overall Level 2 44,836 0.300 0.027 0.247 0.026 0 0.824 0.86
Overall Level 3 35,229 0.236 0 0.026 0.191 0.018 0.811 0.86
Overall Level 4 22,029 0.147 0 0 0.016 0.131 0.891 0.86
Claim 1 Below 26,667 0.476 0.284 0.146 0.039 0.007 0.904 0.897
Claim 1 Near 14,260 0.254 0.021 0.112 0.1 0.021 0.897
Claim 1 Above 15,138 0.270 0.005 0.025 0.099 0.141 0.89 0.897
Claim 2/4 Below 6,809 0.317 0.215 0.097 0.006 0 0.982 0.979
Claim 2/4 Near 10,072 0.470 0.028 0.21 0.196 0.036 0.979
Claim 2/4 Above 4,566 0.213 0 0.005 0.068 0.14 0.976 0.979
Claim 3 Below 8,566 0.153 0.105 0.044 0.003 0 0.975 0.954
Claim 3 Near 35,998 0.642 0.204 0.226 0.168 0.044 0.954
Claim 3 Above 11,501 0.205 0.002 0.012 0.067 0.125 0.934 0.954
Total: All Students 149,404 1.000
Table 2.23: GRADE 5 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 62,501 0.413 0.38 0.032 0 0 0.922 0.857
Overall Level 2 39,700 0.262 0.028 0.209 0.025 0 0.798 0.857
Overall Level 3 25,466 0.168 0 0.023 0.127 0.018 0.756 0.857
Overall Level 4 23,806 0.157 0 0 0.016 0.141 0.895 0.857
Claim 1 Below 30,058 0.531 0.334 0.144 0.039 0.014 0.9 0.9
Claim 1 Near 14,459 0.256 0.023 0.111 0.086 0.036 0.9
Claim 1 Above 12,056 0.213 0.003 0.018 0.058 0.133 0.9 0.9
Claim 2/4 Below 7,687 0.353 0.276 0.073 0.004 0 0.987 0.978
Claim 2/4 Near 10,301 0.473 0.066 0.215 0.146 0.047 0.978
Claim 2/4 Above 3,770 0.173 0 0.005 0.039 0.129 0.969 0.978
Claim 3 Below 17,353 0.307 0.23 0.065 0.01 0.002 0.963 0.946
Claim 3 Near 30,361 0.537 0.129 0.197 0.138 0.072 0.946
Claim 3 Above 8,859 0.157 0.001 0.01 0.036 0.11 0.928 0.946
Total: All Students 151,473 1.000
Table 2.24: GRADE 6 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 59,217 0.395 0.364 0.031 0 0 0.922 0.849
Overall Level 2 44,650 0.298 0.032 0.237 0.029 0 0.796 0.849
Overall Level 3 27,236 0.182 0 0.026 0.137 0.019 0.754 0.849
Overall Level 4 18,689 0.125 0 0 0.015 0.11 0.883 0.849
Claim 1 Below 29,501 0.528 0.327 0.159 0.035 0.007 0.92 0.912
Claim 1 Near 16,311 0.292 0.022 0.13 0.103 0.037 0.912
Claim 1 Above 10,025 0.180 0.002 0.016 0.055 0.108 0.904 0.912
Claim 2/4 Below 7,804 0.368 0.278 0.085 0.005 0 0.985 0.979
Claim 2/4 Near 10,038 0.474 0.051 0.223 0.157 0.043 0.979
Claim 2/4 Above 3,353 0.158 0 0.004 0.039 0.114 0.973 0.979
Claim 3 Below 13,219 0.237 0.178 0.052 0.006 0.001 0.973 0.961
Claim 3 Near 36,188 0.648 0.173 0.247 0.158 0.071 0.961
Claim 3 Above 6,430 0.115 0 0.006 0.029 0.08 0.949 0.961
Total: All Students 149,792 1.000
Table 2.25: GRADE 7 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 57,907 0.379 0.346 0.033 0 0 0.912 0.847
Overall Level 2 42,525 0.279 0.032 0.218 0.028 0 0.782 0.847
Overall Level 3 31,468 0.206 0 0.025 0.162 0.019 0.786 0.847
Overall Level 4 20,744 0.136 0 0 0.015 0.121 0.887 0.847
Claim 1 Below 31,038 0.547 0.274 0.186 0.071 0.015 0.841 0.89
Claim 1 Near 14,390 0.253 0.011 0.103 0.107 0.032 0.89
Claim 1 Above 11,359 0.200 0.001 0.011 0.063 0.125 0.939 0.89
Claim 2/4 Below 6,395 0.295 0.232 0.06 0.003 0 0.988 0.979
Claim 2/4 Near 11,101 0.513 0.086 0.232 0.166 0.028 0.979
Claim 2/4 Above 4,158 0.192 0 0.006 0.057 0.13 0.971 0.979
Claim 3 Below 9,893 0.174 0.122 0.043 0.008 0.001 0.948 0.956
Claim 3 Near 39,225 0.691 0.164 0.253 0.199 0.075 0.956
Claim 3 Above 7,669 0.135 0 0.004 0.034 0.096 0.965 0.956
Total: All Students 152,644 1.000
Table 2.26: GRADE 8 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 21,056 0.359 0.316 0.043 0 0 0.879 0.797
Overall Level 2 16,952 0.289 0.041 0.208 0.04 0 0.719 0.797
Overall Level 3 12,016 0.205 0 0.034 0.145 0.026 0.709 0.797
Overall Level 4 8,599 0.147 0 0 0.019 0.128 0.871 0.797
Claim 1 Below 33,675 0.575 0.333 0.168 0.059 0.014 0.872 0.908
Claim 1 Near 15,702 0.268 0.023 0.109 0.1 0.037 0.908
Claim 1 Above 9,188 0.157 0.001 0.008 0.045 0.103 0.944 0.908
Claim 2/4 Below 7,800 0.355 0.292 0.059 0.004 0 0.989 0.977
Claim 2/4 Near 10,125 0.461 0.091 0.206 0.135 0.029 0.977
Claim 2/4 Above 4,026 0.183 0 0.006 0.048 0.129 0.965 0.977
Claim 3 Below 11,046 0.189 0.138 0.04 0.009 0.002 0.945 0.931
Claim 3 Near 39,911 0.681 0.217 0.235 0.161 0.067 0.931
Claim 3 Above 7,608 0.130 0.001 0.009 0.034 0.085 0.918 0.931
Total: All Students 58,623 1.000
Table 2.27: HIGH SCHOOL MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 3,169 0.346 0.313 0.032 0 0 0.906 0.833
Overall Level 2 2,572 0.281 0.036 0.21 0.035 0 0.749 0.833
Overall Level 3 2,252 0.246 0 0.03 0.196 0.02 0.798 0.833
Overall Level 4 1,171 0.128 0 0 0.014 0.113 0.888 0.833
Claim 1 Below 4,249 0.464 0.336 0.12 0.008 0 0.983 0.987
Claim 1 Near 2,893 0.316 0.013 0.15 0.147 0.006 0.987
Claim 1 Above 2,020 0.220 0 0.002 0.091 0.128 0.99 0.987
Claim 2/4 Below 2,662 0.291 0.247 0.042 0.002 0 0.993 0.967
Claim 2/4 Near 4,343 0.474 0.101 0.217 0.144 0.012 0.967
Claim 2/4 Above 2,157 0.235 0 0.014 0.1 0.121 0.94 0.967
Claim 3 Below 3,016 0.329 0.26 0.064 0.005 0 0.985 0.978
Claim 3 Near 4,539 0.495 0.089 0.203 0.179 0.024 0.978
Claim 3 Above 1,607 0.175 0 0.005 0.061 0.109 0.971 0.978
Total: All Students 9,164 1.000

2.5 Standard Errors of Measurement (SEMs)

The standard error of measurement (SEM) information in this section is based on student scores and associated SEMs included in the data Smarter Balanced received from members after the 2021-22 administration. Student scores and SEMs are not computed directly by Smarter Balanced. They are computed by service providers who deliver the test according to the scoring specifications provided by Smarter Balanced. These include the use of Equation (2.6) in this chapter for computing SEMs. According to this equation, and the adaptive nature of the test, different students receive different items. The amount of measurement error will therefore vary from student to student, even among students with the same estimate of achievement.

All of the SEM statistics reported in this chapter are based on the full blueprint and are in the reporting scale metric. For member data that includes SEMs in the theta metric exclusively, the SEMs are transformed to the reporting metric using the multiplication factors in the theta-to-scale-score transformation given in Chapter 5. Please remember that ELA/literacy and mathematics are not in the same metric and keep in mind that due to different schools’ responses to post-pandemic test administration in 2021-22, data used for SEMs analyses should not be considered as representative of the whole population.

Table 2.28 and Table 2.29 show the trend in the SEM by student decile for ELA/literacy and mathematics, respectively. Deciles were defined by ranking students from highest to lowest scale score and dividing the students into 10 equal-sized groups according to rank. Decile 1 contains the 10% of students with the lowest scale scores. Decile 10 contains the 10% of students with the highest scale scores. The SEM reported for a decile is the average SEM among examinees at that decile.

Table 2.28: MEAN OVERALL SEM AND CONDITIONAL SEMS BY DECILE, ELA/LITERACY
Grade Mean d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
3 21.8 27.4 21.8 21.2 20.9 20.9 20.7 20.6 20.7 20.8 22.4
4 23.4 29.0 23.9 22.7 22.5 22.4 22.2 22.1 22.0 22.2 24.8
5 23.0 27.1 22.2 21.3 21.2 21.3 21.6 22.1 23.2 23.9 25.3
6 24.2 30.5 25.3 23.4 22.8 22.7 22.7 22.8 22.8 23.2 25.6
7 26.1 34.2 27.4 25.6 24.5 24.0 24.1 24.4 24.9 25.3 26.5
8 28.6 38.3 30.4 28.2 27.2 26.7 26.3 26.0 26.1 27.0 29.3
HS 32.4 39.8 32.3 30.9 30.5 30.5 30.7 30.8 31.1 32.0 34.1


Table 2.29: MEAN OVERALL SEM AND CONDITIONAL SEMS BY DECILE, MATHEMATICS
Grade Mean d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
3 17.5 25.7 19.3 17.8 16.7 16.1 15.8 15.5 15.4 15.3 16.4
4 17.7 27.4 20.3 18.3 16.9 16.3 16.0 15.8 15.2 14.9 15.6
5 21.4 35.3 28.0 25.0 22.4 20.4 18.3 17.2 15.6 15.2 16.1
6 23.2 39.2 28.8 25.3 23.6 21.6 20.3 19.8 18.6 17.3 17.2
7 25.3 45.0 32.5 28.4 26.1 24.4 22.2 21.0 18.7 17.6 17.1
8 30.2 43.7 37.5 34.5 32.2 30.1 28.4 26.8 24.8 22.6 21.2
HS 28.9 47.3 35.8 31.5 29.3 27.7 26.3 24.7 23.2 21.6 20.8

Table 2.30 and Table 2.31 show the average SEM near the achievement level cut scores. In the table, M is Mean and SD is Standard Deviation.

The average SEM reported for a given cut score is the average SEM among students within 10 scale score units of the cut score. In the column headings, “Cut1” is the lowest cut score defining the lower boundary of level 2, “Cut2” defines the lower boundary of level 3, and “Cut3” defines the lower boundary of level 4.

Table 2.30: CONDITIONAL SEM NEAR (±10 POINTS) ACHIEVEMENT LEVEL CUT SCORES, ELA/LITERACY
Grade Cut1v2_N Cut1v2_M Cut1v2_SD Cut2v3_N Cut2v3_M Cut2v3_SD Cut3v4_N Cut3v4_M Cut3v4_SD
3 11017 21.0 2.12 11940 20.7 2.04 10269 20.7 1.97
4 10812 22.5 2.56 11029 22.2 2.43 10698 22.0 2.13
5 11005 21.2 2.28 10952 21.6 2.57 10043 23.8 2.22
6 11807 22.8 2.43 10909 22.8 2.75 7519 23.5 2.98
7 10499 24.8 2.35 11919 24.2 2.69 7177 25.5 2.15
8 3888 27.8 2.18 4666 26.2 2.86 2805 27.4 2.32
HS 316 31.8 1.74 594 30.5 1.58 746 30.9 1.83


Table 2.31: CONDITIONAL SEM NEAR (±10 POINTS) OF ACHIEVEMENT LEVEL CUT SCORES, MATHEMATICS
Grade Cut1v2_N Cut1v2_M Cut1v2_SD Cut2v3_N Cut2v3_M Cut2v3_SD Cut3v4_N Cut3v4_M Cut3v4_SD
3 12203 16.7 2.02 13481 15.7 1.39 9701 15.3 1.08
4 12285 17.2 2.15 13018 15.9 1.16 9120 14.9 1.61
5 11245 21.3 3.01 11365 16.7 2.72 9104 15.2 2.19
6 11228 22.7 2.83 11324 19.4 1.93 7617 17.1 2.21
7 10390 25.7 4.67 10354 20.9 3.24 7853 17.4 2.33
8 4151 31.9 5.92 4372 26.7 4.38 2953 22.5 3.52
HS 539 29.2 1.52 607 25.1 1.64 390 21.3 1.66

Figure 2.2 to Figure 2.15 are scatter plots of a random sample of 2,000 individual student SEMs as a function of scale score for the total test and claims/subscores by grade within subject. These plots show the variability of SEMs among students with the same scale score as well as the trend in SEM with student achievement (scale score). In comparison to the total score, a claim score has greater measurement error and variability among students due to the fact that the claim score is based on a smaller number of items. Among claims, those representing fewer items will have higher measurement error and greater variability of measurement error than those representing more items.

Dashed vertical lines in Figure 2.2 to Figure 2.15 represent the achievement level cut scores. The plots for the high school standard errors show cut scores for each grade 9, 10, and 11, separately.

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 3

Figure 2.2: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 3

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 4

Figure 2.3: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 4

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 5

Figure 2.4: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 5

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 6

Figure 2.5: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 6

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 7

Figure 2.6: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 7

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 8

Figure 2.7: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 8

Students' Standard Error of Measurement by Scale Score, ELA/Literacy High School

Figure 2.8: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy High School


Students' Standard Error of Measurement by Scale Score, Mathematics Grade 3

Figure 2.9: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 3

Students' Standard Error of Measurement by Scale Score, Mathematics Grade 4

Figure 2.10: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 4

Students' Standard Error of Measurement by Scale Score, Mathematics Grade 5

Figure 2.11: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 5

Students' Standard Error of Measurement by Scale Score, Mathematics Grade 6

Figure 2.12: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 6

Students' Standard Error of Measurement by Scale Score, Mathematics Grade 7

Figure 2.13: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 7

Students' Standard Error of Measurement by Scale Score, Mathematics Grade 8

Figure 2.14: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 8

Students' Standard Error of Measurement by Scale Score, Mathematics High School

Figure 2.15: Students’ Standard Error of Measurement by Scale Score, Mathematics High School

All of the tables and figures in this section, for every grade and subject, show a trend of higher measurement error for lower-achieving students. This trend reflects the fact that the item pool is difficult in comparison to overall student achievement. The computer adaptive test (CAT) algorithm still delivers easier items to lower-achieving students than they would typically receive in a non-adaptive test, or in a fixed form where difficulty is similar to that of the item pool as a whole. But low-achieving students still tend to receive items that are relatively more difficult for them. Typically, this is because the CAT algorithm does not have easier items available within the blueprint constraints that must be met for all students.

References

Smarter Balanced. (2023b). Smarter Balanced Scoring Specifications for Summative and Interim Assessments. Retrieved from https://technicalreports.smarterbalanced.org/scoring_specs/_book/scoringspecs.html.