Chapter 2 Reliability, Precision, and Errors of Measurement

2.1 Introduction

This chapter addresses the technical quality of operational test functioning with regard to precision and reliability. Part of the test validity argument is that scores must be consistent and precise enough to be useful for intended purposes. If scores are to be meaningful, tests should deliver the same results under repeated administrations to the same student or for students of the same ability. In addition, the range of certainty around the score should be small enough to support educational decisions. The reliability and precision of a test are examined through analysis of measurement error and other test properties in simulated and operational conditions. For example, the reliability of a test may be assessed in part by verifying that different test forms follow the same blueprint. In computer adaptive testing (CAT), one cannot expect the same set of items to be administered to the same examinee more than once. Consequently, reliability is inferred from internal test properties, including test length and the information provided by item parameters. Measurement precision is enhanced when the student receives items that are well matched, in terms of difficulty, to the overall performance level of the student. Measurement precision is also enhanced when the items a student receives work well together to measure the same general body of knowledge, skills, and abilities defined by the test blueprint. Smarter Balanced uses an adaptive model because adaptive tests are customized to each student in terms of the difficulty of the items. Smarter Balanced used item quality control procedures that ensure test items measure the knowledge, skills, and abilities specified in the test blueprint and work well together in this respect. The expected outcome of these and other test administration and item quality control procedures is high reliability and low measurement error.

For the 2022-23 administration, all statistics in this chapter are based on the full blueprint. Measurement bias from the simulation results produced by Cambium Assessment are provided, along with reliability, classification accuracy, and standard errors of measurement based on student data provided by Michigan, Montana, Nevada, and South Dakota. Statistics about the paper/pencil forms are based on the items on the forms, not the students who took the assessment in 2022-23.

2.2 Measurement Bias

Measurement bias is any systematic or non-random error that occurs in estimating a student’s achievement from the student’s scores on test items. Prior to the release of the 2022-23 item pool, simulation studies were carried out to ensure that the item pool, combined with the adaptive test administration algorithm, would produce satisfactory tests with regard to measurement bias and random measurement error as a function of student achievement, overall reliability, fulfillment of test blueprints, and item exposure.

Results for measurement bias with the full blueprint are provided in this section. Measurement bias is the one index of test performance that is clearly and preferentially assessed through simulation as opposed to the use of real data. With real data, true student achievement is unknown. In simulation, true student achievement can be assumed and used to generate item responses. The simulated item responses are used in turn to estimate achievement. Achievement estimates are then compared to the underlying assumed, true values of student achievement to assess whether the estimates contain systematic error (bias).

Simulations for the 2022-23 administration were carried out by Cambium Assessment. The simulations were performed for each grade within a subject area for the standard item pool (English) and for accommodation item pools of braille and Spanish for mathematics and braille for ELA/literacy. For the standard item pools, the number of simulees was 3,000 for grades 3-8 and 5,000 for grade 11. For the braille and Spanish pools, the number of simulees was 1,000 for grades 3-8 and 2,000 for grade 11. True student achievement values were sampled from a normal distribution for each grade and subject. The parameters for the normal distribution were based on students’ operational scores on the 2018–2019 Smarter Balanced summative tests.

Test events were created for the simulated examinees using the 2022-23 item pool. Estimated ability ( \(\hat{\theta}\) ) was calculated from the simulated tests using maximum likelihood estimation (MLE) as described in the Smarter Balanced Test Scoring Specifications (Smarter Balanced, 2023b).

Bias was computed as:

\[\begin{equation} bias = N^{-1}\sum_{i = 1}^{N} (\theta_{i} - \hat{\theta}_{i}) \tag{2.1} \end{equation}\]

and the error variance of the estimated bias is:

\[\begin{equation} ErrorVar(bias) = \frac{1}{N(N-1)}\sum_{i = 1}^{N} (\theta_{i} - \hat{\theta}_{i}-mean(\theta_{i}-\hat{\theta}_{i}))^{2} \tag{2.2} \end{equation}\]

where \(\theta_{i} - \hat{\theta}\) is the deviation score, and \(N\) denotes the number of simulees (\(N = 1000\) for all conditions). Statistical significance of the bias is tested using a z-test: \[\begin{equation} z = \frac{bias}{\sqrt{ErrorVar(bias)}} \tag{2.3} \end{equation}\]

Table 2.1 and Table 2.2 show for ELA/literacy and mathematics, respectively, the bias in estimates of student achievement based on the complete test assembled from the standard item pool and the accommodations pools included in the simulations. The standard error of bias is the denominator of the z-score in Equation (2.3). The p-value is the probability \(|Z| > |z|\) where \(Z\) is a standard normal variate and \(|z|\) is the absolute value of the \(z\) computed in Equation (2.3). Under the hypothesis of no bias, approximately 5% and 1% of the \(\theta_{i}\) will fall outside, respectively, 95% and 99% confidence intervals centered on \(\theta_{i}\).

Mean bias was generally very small in practical terms, exceeding .02 in absolute value in no cases for ELA/literacy and in only six cases for mathematics. Mean bias tended to be statistically significantly different from 0, but this was due to the large sample sizes used for the simulation. In virtually all cases, the percentage of simulated examinees whose estimated achievement score fell outside the confidence intervals centered on their true score was close to expected values of 5% for the 95% confidence interval and 1% for the 99% confidence interval. Plots of bias by estimated theta in the full simulation report show that positive and statistically significant mean bias was due to thetas being underestimated in regions of student achievement far below the lowest cut score (separating achievement levels 1 and 2). The same plots show that estimation bias is negligible near all cut scores in all cases.

Table 2.1: BIAS OF THE ESTIMATED PROFICIENCIES: ENGLISH LANGUAGE ARTS/LITERACY
Pool Grade Mean Bias SE (Bias) P value MSE 95% CI Miss Rate 99% CI Miss Rate
Standard 3 0.00 0.01 0.51 0.11 4.90% 1.03%
4 0.00 0.01 0.68 0.13 4.97% 0.87%
5 -0.01 0.01 0.43 0.14 5.63% 0.70%
6 0.00 0.01 0.73 0.13 5.20% 1.17%
7 -0.01 0.01 0.41 0.14 4.73% 0.83%
8 -0.01 0.01 0.23 0.17 5.34% 0.77%
HS 0.00 0.01 0.58 0.19 5.26% 1.08%
Braille 3 0.02 0.01 0.11 0.12 5.40% 0.90%
4 -0.01 0.01 0.35 0.12 5.21% 0.80%
5 0.01 0.01 0.48 0.12 4.50% 0.90%
6 -0.02 0.01 0.07 0.14 5.80% 1.00%
7 0.00 0.01 0.82 0.15 4.50% 1.10%
8 -0.02 0.01 0.21 0.18 5.60% 1.10%
HS 0.00 0.01 0.99 0.19 4.40% 0.70%


Table 2.2: BIAS OF THE ESTIMATED PROFICIENCIES: MATHEMATICS
Pool Grade Mean Bias SE (Bias) P value MSE 95% CI Miss Rate 99% CI Miss Rate
Standard 3 0.00 0.00 0.41 0.07 4.63% 1.07%
4 0.00 0.00 0.6 0.07 4.97% 0.93%
5 0.01 0.01 0.01 0.09 4.33% 0.67%
6 0.00 0.01 0.53 0.12 4.27% 0.93%
7 0.02 0.01 0.01 0.13 4.81% 0.83%
8 0.02 0.01 0.01 0.18 4.90% 0.90%
HS 0.04 0.01 < 0.005 0.20 4.62% 0.86%
Braille 3 0.00 0.01 0.85 0.08 4.50% 0.90%
4 0.00 0.01 0.72 0.09 4.71% 1.20%
5 0.02 0.01 0.15 0.12 4.90% 1.00%
6 0.00 0.01 0.77 0.17 4.81% 0.70%
7 0.01 0.01 0.27 0.15 4.21% 0.80%
8 0.01 0.01 0.63 0.19 5.01% 1.30%
HS 0.03 0.01 0.01 0.21 4.75% 0.85%
Spanish 3 -0.01 0.01 0.54 0.07 5.20% 0.80%
4 0.00 0.01 0.73 0.08 4.70% 0.70%
5 0.03 0.01 0.01 0.12 3.81% 0.60%
6 0.02 0.01 0.1 0.14 4.41% 0.70%
7 0.01 0.01 0.3 0.14 5.61% 1.00%
8 0.02 0.01 0.22 0.19 4.80% 1.00%
HS 0.03 0.01 < 0.005 0.20 5.00% 1.00%

2.3 Reliability

Reliability estimates reported in this section are derived from internal, IRT-based estimates of the measurement error in the test scores of examinees (MSE) and the observed variance of examinees’ test scores on the \(\theta\)-scale \((var(\hat{\theta}))\). The formula for the reliability estimate (\(\rho\)) is:

\[\begin{equation} \hat{\rho} = 1 - \frac{MSE}{var(\hat{\theta})}. \tag{2.4} \end{equation}\]

According to Smarter Balanced Test Scoring Specifications (Smarter Balanced, 2023b), estimates of measurement error are obtained from the parameter estimates of the items taken by the examinees. This is done by computing the test information for each examinee \(i\) as:

\[\begin{equation} I({\theta}_{j}) = \sum_{i=1} 1.7^2 a_i^2 \left\{\sum_{v=0}^{m_i-1} v^2 p_{iv}(z_{ij} = v) - \left[\sum_{v=0}^{m_i-1} v p_{iv}(z_{ij} = v)\right]^2 \right\}. \tag{2.5} \end{equation}\]

where \(p_{iv}(z_{ij} = v)\) is the probability of responding in scoring category \(v\) where \(v=0,1,...,m_i-1\). When the item is based on the 2PL model, \(m_i=2\) and Equation (2.5) simplifies to

\[\begin{equation} I({\theta}_{j}) = \sum_{i=1} 1.7^2 a_i^2 \left[p_{i}(z_{ij} = 1) - p_{i}(z_{ij} = 1)^2 \right], \tag{2.6} \end{equation}\]

The test information is computed using only the items answered by the examinee. The measurement error (SEM) for examinee \(i\) is then computed as:

\[\begin{equation} SEM(\hat{\theta_i}) = \frac{1}{\sqrt{I(\hat{\theta_i})}}. \tag{2.7} \end{equation}\]

The upper bound of \(SEM(\hat{\theta_i})\) is set to 2.5. Any value larger than 2.5 is truncated at 2.5. The mean squared error for a group of \(N\) examinees is then:

\[\begin{equation} MSE = N^{-1}\sum_{i=1}^N SEM(\hat{\theta_i})^2 \tag{2.8} \end{equation}\]

and the variance of the achievement scores is: \[\begin{equation} var(\hat{\theta}) = N^{-1}\sum_{i=1}^N SEM(\hat{\theta_i} - \overline{\hat{\theta}})^2 \tag{2.9} \end{equation}\]

where \(\overline{\hat{\theta}}\) is the average of the \(\hat{\theta_i}\).

The measurement error for a group of examinees is typically reported as the square root of \(MSE\) and is denoted \(RMSE\). Measurement error is computed with Equation (2.7) and Equation (2.8) on a scale where achievement has a standard deviation close to 1 among students at a given grade. Measurement error reported in the tables of this section is transformed to the reporting scale by multiplying the theta-scale measurement error by \(a\), where \(a\) is the slope used to convert estimates of student achievement on the \(\theta\)-scale to the reporting scale. The transformation equations for converting estimates of student achievement on the \(\theta\)-scale to the reporting scale are given in Chapter 5.

2.3.1 General Population

Reliability estimates in this section are based on real data and the full blueprint. In mathematics, claims 2 and 4 are reported together as a single subscore, so there are only three reporting categories for mathematics, but four claims. Table 2.3 and Table 2.4 show the reliability of the observed total scores and subscores for ELA/literacy and mathematics. Reliability estimates are high for the total score in both subjects. Reliability coefficients are high for the claim 1 score in mathematics, moderately high for the claim 1 and claim 2 scores in ELA/literacy, and moderately high to moderate for the remainder of the claim scores in both subjects. The lowest reliability coefficient in either subject is .554, which is the reliability of the claim 3 score in the grade 8 mathematics assessment.

Table 2.3: ELA/LITERACY SUMMATIVE SCALE SCORE MARGINAL RELIABILITY ESTIMATES
Grade N Total score Claim 1 Claim 2 Claim 3 Claim 4
3 168,997 0.932 0.791 0.712 0.597 0.733
4 185,450 0.930 0.791 0.718 0.609 0.686
5 187,118 0.935 0.801 0.741 0.624 0.754
6 187,572 0.925 0.780 0.753 0.599 0.706
7 187,142 0.919 0.770 0.755 0.620 0.681
8 95,882 0.911 0.784 0.709 0.623 0.688
HS 9,445 0.904 0.753 0.737 0.588 0.693
Table 2.4: MATHEMATICS SUMMATIVE SCALE SCORE MARGINAL RELIABILITY ESTIMATES
Grade N Total score Claim 1 Claim 2/4 Claim 3
3 168,914 0.956 0.925 0.705 0.753
4 185,210 0.956 0.924 0.754 0.746
5 186,870 0.944 0.905 0.660 0.726
6 187,051 0.943 0.906 0.717 0.642
7 186,403 0.937 0.897 0.654 0.693
8 95,094 0.922 0.893 0.617 0.624
HS 9,459 0.930 0.889 0.702 0.695

2.3.2 Demographic Groups

Reliability estimates in this section are based on real data and the full blueprint. During the 2021-22 administration year, most students and schools are back to normal testing. However, schools responded differently to post-pandemic test administration. Many states maintained remote testing and some states switched to adjusted blueprint. consequently, demographic groups results presented below should not be considered representative of the entire student population. Table 2.5 and Table 2.6 show the reliability of the test for students of different racial groups in ELA/literacy and mathematics who tested in 2022-23. Table 2.7 and Table 2.8 show the reliability of the test for students who tested in 2022-23, grouped by demographics typically requiring accommodations or accessibility tools. The groups include English learners (EL) and students falling under Individuals with Disabilities Education Act (IDEA).

Because of the differences in average score across demographic groups and the relationship between measurement error and student achievement scores, which will be seen in the next section of this chapter, demographic groups with lower average scores tend to have lower reliability than the population as a whole. Nevertheless, the reliability coefficients for all demographic groups in these tables are moderately high to high.

Table 2.5: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY ETHNIC GROUP - ELA/LITERACY
Grade Group N Var MSE Rho
3 Total 168,997 8100 551 0.932
American Indian or Alaska Native 4,421 6834 714 0.896
Asian 7,089 8161 570 0.930
Black/African American 26,487 6665 570 0.914
Native Hawaiian or Pacific Islander 945 7775 611 0.921
Hispanic/Latino Ethnicity 33,839 7327 602 0.918
White 101,528 7653 554 0.928
4 Total 185,450 9030 635 0.930
American Indian or Alaska Native 4,240 8388 842 0.900
Asian 7,990 8944 673 0.925
Black/African American 28,828 7544 651 0.914
Native Hawaiian or Pacific Islander 967 8862 709 0.920
Hispanic/Latino Ethnicity 35,790 8340 695 0.917
White 112,690 8386 639 0.924
5 Total 187,118 9479 613 0.935
American Indian or Alaska Native 4,130 8979 825 0.908
Asian 8,140 8783 670 0.924
Black/African American 28,840 7751 605 0.922
Native Hawaiian or Pacific Islander 1,009 9087 671 0.926
Hispanic/Latino Ethnicity 36,113 8770 653 0.926
White 113,733 8989 626 0.930
6 Total 187,572 9090 681 0.925
American Indian or Alaska Native 4,171 7896 926 0.883
Asian 7,808 8570 718 0.916
Black/African American 29,171 7338 694 0.905
Native Hawaiian or Pacific Islander 950 9095 804 0.912
Hispanic/Latino Ethnicity 36,610 8425 748 0.911
White 114,104 8600 680 0.921
7 Total 187,142 9985 804 0.919
American Indian or Alaska Native 4,062 9645 1173 0.878
Asian 7,904 9290 792 0.915
Black/African American 28,821 8526 857 0.900
Native Hawaiian or Pacific Islander 921 9512 844 0.911
Hispanic/Latino Ethnicity 36,429 9595 875 0.909
White 114,492 9526 807 0.915
8 Total 95,882 10902 969 0.911
American Indian or Alaska Native 3,560 9778 1357 0.861
Asian 4,916 10132 917 0.910
Black/African American 12,171 9869 1085 0.890
Native Hawaiian or Pacific Islander 925 9710 900 0.907
Hispanic/Latino Ethnicity 29,155 9888 963 0.903
White 55,677 10592 1003 0.905
HS Total 9,445 11515 1102 0.904
American Indian or Alaska Native 740 10538 1218 0.884
Asian 161 12832 1108 0.914
Black/African American 301 12575 1141 0.909
Native Hawaiian or Pacific Islander 14 14044 1223 0.913
Hispanic/Latino Ethnicity 621 13621 1158 0.915
White 7,173 10308 1085 0.895


Table 2.6: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY ETHNIC GROUP - MATHEMATICS
Grade Group N Var MSE Rho
3 Total 168,914 7885 346 0.956
American Indian or Alaska Native 4,406 6437 462 0.928
Asian 7,096 7744 341 0.956
Black/African American 26,466 6837 417 0.939
Native Hawaiian or Pacific Islander 940 7344 375 0.949
Hispanic/Latino Ethnicity 33,828 6921 366 0.947
White 101,456 6985 329 0.953
4 Total 185,210 7956 353 0.956
American Indian or Alaska Native 4,233 7261 496 0.932
Asian 7,991 8017 346 0.957
Black/African American 28,732 6702 460 0.931
Native Hawaiian or Pacific Islander 968 8033 366 0.954
Hispanic/Latino Ethnicity 35,744 7319 382 0.948
White 112,596 6836 323 0.953
5 Total 186,870 8995 508 0.944
American Indian or Alaska Native 4,102 7708 738 0.904
Asian 8,143 8670 411 0.953
Black/African American 28,765 7028 720 0.898
Native Hawaiian or Pacific Islander 1,006 8242 522 0.937
Hispanic/Latino Ethnicity 36,058 7911 572 0.928
White 113,617 8048 453 0.944
6 Total 187,051 11191 643 0.943
American Indian or Alaska Native 4,148 10309 1104 0.893
Asian 7,804 10992 499 0.955
Black/African American 29,058 9649 901 0.907
Native Hawaiian or Pacific Islander 950 10336 690 0.933
Hispanic/Latino Ethnicity 36,437 9765 757 0.922
White 113,846 9937 574 0.942
7 Total 186,403 11924 746 0.937
American Indian or Alaska Native 4,027 9797 1196 0.878
Asian 7,894 12268 537 0.956
Black/African American 28,635 9486 1073 0.887
Native Hawaiian or Pacific Islander 907 10054 814 0.919
Hispanic/Latino Ethnicity 36,182 10213 901 0.912
White 114,175 11012 656 0.940
8 Total 95,094 12767 1002 0.922
American Indian or Alaska Native 3,494 10409 1503 0.856
Asian 4,912 13105 731 0.944
Black/African American 11,977 10513 1271 0.879
Native Hawaiian or Pacific Islander 915 9991 1043 0.896
Hispanic/Latino Ethnicity 28,846 10518 1180 0.888
White 55,338 12886 923 0.928
HS Total 9,459 12886 905 0.930
American Indian or Alaska Native 744 8391 1501 0.821
Asian 163 16476 872 0.947
Black/African American 304 12664 1163 0.908
Native Hawaiian or Pacific Islander 15 13074 1368 0.895
Hispanic/Latino Ethnicity 628 12023 1119 0.907
White 7,172 11495 811 0.929


Table 2.7: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY GROUP - ELA/LITERACY
Grade Group N Var MSE Rho
3 Total 168,997 8100 551 0.932
EL Status 12,620 6210 686 0.890
IDEA Indicator 13,996 6999 748 0.893
Section 504 Status 2,432 8083 703 0.913
Economic Disadvantage Status 41,021 7197 652 0.909
4 Total 185,450 9030 635 0.930
EL Status 11,764 6755 803 0.881
IDEA Indicator 14,018 8020 851 0.894
Section 504 Status 2,983 9113 858 0.906
Economic Disadvantage Status 41,021 8175 763 0.907
5 Total 187,118 9479 613 0.935
EL Status 10,250 6074 763 0.874
IDEA Indicator 13,885 8248 819 0.901
Section 504 Status 3,414 10160 845 0.917
Economic Disadvantage Status 40,723 8835 726 0.918
6 Total 187,572 9090 681 0.925
EL Status 9,015 5106 909 0.822
IDEA Indicator 13,230 6919 937 0.865
Section 504 Status 5,313 10636 895 0.916
Economic Disadvantage Status 39,332 8407 838 0.900
7 Total 187,142 9985 804 0.919
EL Status 8,432 6057 1106 0.817
IDEA Indicator 13,052 8333 1147 0.862
Section 504 Status 5,761 11531 1003 0.913
Economic Disadvantage Status 38,936 9621 967 0.900
8 Total 95,882 10902 969 0.911
EL Status 8,095 5598 1180 0.789
IDEA Indicator 12,606 7946 1209 0.848
Section 504 Status 6,033 11970 1032 0.914
Economic Disadvantage Status 39,330 9626 1015 0.895
HS Total 9,445 11515 1102 0.904
EL Status 230 6936 1373 0.802
IDEA Indicator 859 8561 1297 0.849
Section 504 Status 470 12142 1100 0.909
Economic Disadvantage Status 2,809 11192 1111 0.901


Table 2.8: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY GROUP - MATHEMATICS
Grade Group N Var MSE Rho
3 Total 168,914 7885 346 0.956
EL Status 12,655 6179 403 0.935
IDEA Indicator 14,010 7651 490 0.936
Section 504 Status 2,459 7554 404 0.947
Economic Disadvantage Status 40,950 6872 384 0.944
4 Total 185,210 7956 353 0.956
EL Status 11,770 6354 451 0.929
IDEA Indicator 13,980 8065 542 0.933
Section 504 Status 3,001 7476 380 0.949
Economic Disadvantage Status 40,939 7551 406 0.946
5 Total 186,870 8995 508 0.944
EL Status 10,251 5689 736 0.871
IDEA Indicator 13,839 7745 814 0.895
Section 504 Status 3,433 8354 556 0.933
Economic Disadvantage Status 40,574 7958 610 0.923
6 Total 187,051 11191 643 0.943
EL Status 8,997 6617 1121 0.831
IDEA Indicator 13,063 9540 1207 0.873
Section 504 Status 3,747 10184 690 0.932
Economic Disadvantage Status 39,075 9742 815 0.916
7 Total 186,403 11924 746 0.937
EL Status 8,384 6261 1308 0.791
IDEA Indicator 12,872 8829 1397 0.842
Section 504 Status 4,015 11180 770 0.931
Economic Disadvantage Status 38,519 9973 980 0.902
8 Total 95,094 12767 1002 0.922
EL Status 8,008 6015 1653 0.725
IDEA Indicator 12,374 9037 1566 0.827
Section 504 Status 4,364 12617 964 0.924
Economic Disadvantage Status 38,802 10521 1205 0.885
HS Total 9,459 12886 905 0.930
EL Status 245 5654 1607 0.716
IDEA Indicator 864 8360 1522 0.818
Section 504 Status 466 12789 950 0.926
Economic Disadvantage Status 2,809 11140 1077 0.903

2.3.3 Paper/Pencil Tests

Smarter Balanced supports fixed-form paper/pencil tests adherent to the full blueprint for use in a variety of situations, including schools that lack computer capacity and to address potential religious concerns associated with using technology for assessments. Scores on the paper/pencil tests are on the same reporting scale that is used for the online assessments. The forms used in the 2022-23 administration are collectively (for all grades) referred to as Form 5.

Table 2.9 and Table 2.10 show, for ELA/literacy and mathematics, respectively, statistical information pertaining to the items on Form 6 and to the measurement precision of the form. MSE estimates for the paper/pencil forms were based on Equation (??) through Equation (2.8), except that quadrature points and weights over a hypothetical theta distribution were used instead of observed scores (theta_hats). The hypothetical true score distribution used for quadrature was the student distribution from the 2021-22 operational administration. Reliability was then computed as in Equation (2.4) with the observed-score variance equal to the MSE plus the variance of the hypothetical true score distribution. Reliability was better for the full test than for subscales and is inversely related to the SEM.

Table 2.9: RELIABILITY OF PAPER PENCIL TESTS, FORM 5 ENGLISH LANGUAGE ARTS/LITERACY
Grade Nitems Rho SEM Avg. b Avg. a C1 Rho C1 SEM C2 Rho C2 SEM C3 Rho C3 SEM C4 Rho C4 SEM
3 39 0.915 0.357 -1.134 0.670 0.767 0.643 0.755 0.667 0.642 0.873 0.708 0.751
4 39 0.917 0.360 -0.640 0.678 0.799 0.599 0.726 0.735 0.666 0.846 0.691 0.800
5 39 0.920 0.360 -0.174 0.663 0.782 0.648 0.766 0.678 0.664 0.871 0.717 0.770
6 39 0.904 0.390 0.361 0.555 0.772 0.650 0.746 0.699 0.556 1.069 0.594 0.989
7 39 0.914 0.392 0.868 0.564 0.782 0.677 0.761 0.717 0.623 0.996 0.670 0.899
8 42 0.923 0.376 1.050 0.597 0.786 0.682 0.765 0.723 0.693 0.868 0.687 0.881
11 40 0.930 0.407 1.008 0.580 0.815 0.704 0.781 0.783 0.680 1.014 0.738 0.882


Table 2.10: RELIABILITY OF PAPER PENCIL TEST, FORM 5 MATHEMATICS
Grade Nitems Rho SEM Avg. b Avg. a C1 Rho C1 SEM C2&4 Rho C2&4 SEM C3 Rho C3 SEM
3 36 0.913 0.370 -0.935 0.699 0.822 0.558 0.698 0.787 0.751 0.689
4 38 0.917 0.363 -0.305 0.768 0.844 0.519 0.720 0.751 0.694 0.800
5 39 0.916 0.389 -0.131 0.644 0.836 0.569 0.737 0.769 0.711 0.821
6 39 0.909 0.459 0.533 0.646 0.810 0.704 0.751 0.837 0.699 0.953
7 40 0.914 0.464 0.699 0.631 0.830 0.686 0.738 0.903 0.719 0.949
8 39 0.898 0.555 1.016 0.536 0.813 0.789 0.604 1.332 0.701 1.076
10 41 0.905 0.566 1.397 0.533 0.833 0.782 0.719 1.095 0.657 1.264
HS 42 0.887 0.626 1.805 0.490 0.816 0.831 0.617 1.378 0.616 1.381

2.4 Classification Accuracy

Information on classification accuracy is based on actual test results from the 2022-23 administration. Classification accuracy is a measure of how accurately test scores or subscores place students into reporting category levels. The likelihood of inaccurate placement depends on the amount of measurement error associated with scores, especially those nearest cut points, and on the distribution of student achievement. For this report, classification accuracy was calculated in the following manner. For each examinee, analysts used the estimated scale score and its standard error of measurement to obtain a normal approximation of the likelihood function over the range of scale scores. The normal approximation took the scale score estimate as its mean and the standard error of measurement as its standard deviation. The proportion of the area under the curve within each level was then calculated.

Figure 2.1 illustrates the approach for one examinee in grade 11 mathematics. In this example, the examinee’s overall scale score is 2606 (placing this student in level 2, based on the cut scores for this grade level), with a standard error of measurement of 31 points. Accordingly, a normal distribution with a mean of 2606 and a standard deviation of 31 was used to approximate the likelihood of the examinee’s true level, based on the observed test performance. The area under the curve was computed within each score range in order to estimate the probability that the examinee’s true score falls within that level (the red vertical lines identify the cut scores). For the student in Figure 2.1, the estimated probabilities were 2.1% for level 1, 74.0% for level 2, 23.9% for level 3, and 0.0% for level 4. Since the student’s assigned level was level 2, there is an estimated 74% chance the student was correctly classified and a 26% (2.1% + 23.9% + 0.0%) chance the student was misclassified.

Illustrative Example of a Normal Distribution Used to Calculate Classification Accuracy

Figure 2.1: Illustrative Example of a Normal Distribution Used to Calculate Classification Accuracy

The same procedure was then applied to all students within the sample. Results are shown for 10 cases in Table 2.11 (student 6 is the case illustrated in Figure 2.1).

Table 2.11: ILLUSTRATIVE EXAMPLE OF CLASSIFICATION ACCURACY CALCULATION RESULTS
Student SS SEM Level P(L1) P(L2) P(L3) P(L4)
1 2751 23 4 0.000 0.000 0.076 0.924
2 2375 66 1 0.995 0.005 0.000 0.000
3 2482 42 1 0.927 0.073 0.000 0.000
4 2529 37 1 0.647 0.349 0.004 0.000
5 2524 36 1 0.701 0.297 0.002 0.000
6 2606 31 2 0.021 0.740 0.239 0.000
7 2474 42 1 0.950 0.050 0.000 0.000
8 2657 26 3 0.000 0.132 0.858 0.009
9 2600 31 2 0.033 0.784 0.183 0.000
10 2672 23 3 0.000 0.028 0.949 0.023

Table 2.12 presents a hypothetical set of results for the overall score and for a claim score (claim 3) for a population of students. The number (N) and proportion (P) of students classified into each achievement level is shown in the first three columns. These are counts and proportions of “observed” classifications in the population. Students are classified into the four achievement levels by their overall score. By claim scores, students are classified as “below,” “near,” or “above” standard, where the standard is the level 3 cut score. Rules for classifying students by their claim scores are detailed in Chapter 7.

The next four columns (“Freq L1,” etc.) show the number of students by “true level” among students at a given “observed level.” The last four columns convert the frequencies by true level into proportions. The sum of proportions in the last four columns of the “Overall” section of the table equals 1.0. Likewise, the sum of proportions in the last four columns of the “Claim 3” section of the table equals 1.0. For the overall test, the proportions of correct classifications for this hypothetical example are .404, .180, .145, and .098 for levels 1-4, respectively.

Table 2.12: EXAMPLE OF CROSS-CLASSIFYING TRUE ACHIEVEMENT LEVEL BY OBSERVED ACHIEVEMENT LEVEL
Score Observed Level N P Freq L1 Freq L2 Freq L3 Freq L4 Prop L1 Prop L2 Prop L3 Prop L4
Overall Level 1 251,896 0.451 225,454 26,172 263 8 0.404 0.047 0.000 0.000
Level 2 141,256 0.253 21,800 100,364 19,080 11 0.039 0.180 0.034 0.000
Level 3 104,125 0.186 161 14,223 81,089 8,652 0.000 0.025 0.145 0.015
Level 4 61,276 0.110 47 29 6,452 54,748 0.000 0.000 0.012 0.098
Claim 3 Below Standard 167,810 0.300 143,536 18,323 4,961 990 0.257 0.033 0.009 0.002
Near Standard 309,550 0.554 93,364 102,133 89,696 24,357 0.167 0.183 0.161 0.044
Above Standard 81,193 0.145 94 1,214 18,949 60,936 0.000 0.002 0.034 0.109

For claim scores, correct “below” classifications are represented in cells corresponding to the “below standard” row and the levels 1 and 2 columns. Both levels 1 and 2 are below the level 3 cut score, which is the standard. Similarly, correct “above” standard classifications are represented in cells corresponding to the “above standard” row and the levels 3 and 4 columns. Correct classifications for “near” standard are not computed. There is no absolute criterion or scale score range, such as is defined by cut scores, for determining whether a student is truly at or near the standard. That is, the standard (level 3 cut score) clearly defines whether a student is above or below standard, but there is no range centered on the standard for determining whether a student is “near.”

Table 2.13 shows more specifically how the proportion of correct classifications is computed for classifications based on students’ overall and claim scores. For each type of score (overall and claim), the proportion of correct classifications is computed overall and conditionally on each observed classification (except for the “near standard” claim score classification). The conditional proportion correct is the proportion correct within a row divided by the total proportion within a row. For the overall score, the overall proportion correct is the sum of the proportions correct within the overall table section.

Table 2.13: EXAMPLE OF CORRECT CLASSIFICATION RATES
Score Observed Level P Prop L1 Prop L2 Prop L3 Prop L4 Accuracy by level Accuracy overall
Overall Level 1 0.451 0.404 0.047 0.000 0.000 .404/.451=.895 (.404+.180+.145+.098)/1.000=.827
Level 2 0.253 0.039 0.180 0.034 0.000 .180/.253=.711
Level 3 0.186 0.000 0.025 0.145 0.015 .145/.186=.779
Level 4 0.110 0.000 0.000 0.012 0.098 .098/.110=.893
Claim 3 Below Standard 0.300 0.257 0.033 0.009 0.002 (.257+.033)/.300=.965 (.257+.033+.034+.109)/(.300+.145)=.971
Near Standard 0.554 0.167 0.183 0.161 0.044 NA
Above Standard 0.145 0.000 0.002 0.034 0.109 (.034+.109)/.145=.984

For the claim score, the overall classification accuracy rate is based only on students whose observed achievement is “below standard” or “above standard.” That is, the overall proportion correct for classifications by claim scores is the sum of the proportions correct in the claim section of the table, divided by the sum of all of the proportions in the “above standard” and “below standard” rows.

The following two sections show classification accuracy statistics for ELA/literacy and mathematics. There are seven tables in each section—one for each grade 3-8 and high school (HS). The statistics in these tables were computed as described above.

2.4.1 English Language Arts/Literacy

Results in this section are based on real data from students who took the full blueprint. Table 2.14 through Table 2.20 show ELA/literacy classification accuracy for each grade 3-8 and high school (HS). Section 2.4 explains how the statistics in these tables were computed. Classification accuracy for each category was high to moderately high for all ELA/literacy grades.

Table 2.14: GRADE 3 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 57,710 0.341 0.312 0.03 0 0 0.913 0.818
Level 2 40,787 0.241 0.033 0.177 0.032 0 0.734
Level 3 35,172 0.208 0 0.033 0.146 0.029 0.702
Level 4 35,328 0.209 0 0 0.026 0.183 0.874
Claim 1 Below 32,268 0.350 0.28 0.065 0.005 0 0.986 0.983
Near 38,106 0.413 0.049 0.168 0.15 0.046
Above 21,845 0.237 0 0.005 0.051 0.181 0.98
Claim 2 Below 17,858 0.395 0.285 0.088 0.019 0.003 0.946 0.944
Near 17,997 0.398 0.051 0.145 0.138 0.064
Above 9,402 0.208 0.001 0.011 0.052 0.144 0.943
Claim 3 Below 12,744 0.138 0.12 0.016 0.002 0 0.981 0.942
Near 60,506 0.656 0.207 0.204 0.153 0.092
Above 18,970 0.206 0.002 0.017 0.051 0.135 0.903
Claim 4 Below 15,459 0.342 0.274 0.061 0.006 0 0.98 0.973
Near 20,264 0.448 0.063 0.177 0.158 0.051
Above 9,534 0.211 0 0.007 0.045 0.159 0.967
Total: All Students 168,997 1.000
Table 2.15: GRADE 4 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 66,183 0.357 0.327 0.03 0 0 0.916 0.81
Level 2 35,707 0.193 0.032 0.129 0.031 0 0.671
Level 3 40,381 0.218 0 0.034 0.151 0.033 0.693
Level 4 43,179 0.233 0 0 0.03 0.203 0.872
Claim 1 Below 30,783 0.332 0.287 0.04 0.004 0 0.986 0.983
Near 39,955 0.431 0.068 0.146 0.158 0.059
Above 22,008 0.237 0 0.005 0.046 0.186 0.98
Claim 2 Below 19,172 0.419 0.315 0.073 0.026 0.004 0.929 0.937
Near 18,241 0.398 0.059 0.113 0.142 0.084
Above 8,386 0.183 0.001 0.009 0.041 0.132 0.945
Claim 3 Below 18,309 0.197 0.175 0.018 0.004 0 0.978 0.951
Near 53,895 0.581 0.177 0.159 0.154 0.091
Above 20,542 0.221 0.003 0.013 0.051 0.154 0.925
Claim 4 Below 14,979 0.327 0.27 0.047 0.01 0.001 0.967 0.967
Near 21,532 0.470 0.105 0.143 0.156 0.067
Above 9,288 0.203 0.001 0.006 0.044 0.152 0.967
Total: All Students 185,450 1.000
Table 2.16: GRADE 5 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 64,133 0.343 0.315 0.028 0 0 0.918 0.822
Level 2 37,523 0.201 0.031 0.141 0.029 0 0.702
Level 3 49,566 0.265 0 0.032 0.203 0.03 0.765
Level 4 35,896 0.192 0 0 0.028 0.164 0.852
Claim 1 Below 30,383 0.325 0.276 0.045 0.005 0 0.986 0.983
Near 38,834 0.416 0.057 0.146 0.176 0.037
Above 24,163 0.259 0 0.005 0.077 0.176 0.979
Claim 2 Below 17,937 0.390 0.291 0.074 0.024 0.002 0.934 0.941
Near 18,641 0.405 0.06 0.12 0.17 0.055
Above 9,411 0.205 0.001 0.01 0.068 0.126 0.947
Claim 3 Below 18,239 0.195 0.172 0.019 0.004 0 0.98 0.951
Near 54,006 0.578 0.158 0.162 0.183 0.075
Above 21,136 0.226 0.003 0.015 0.071 0.138 0.922
Claim 4 Below 7,553 0.164 0.147 0.015 0.002 0 0.985 0.981
Near 28,826 0.627 0.204 0.184 0.196 0.043
Above 9,610 0.209 0 0.005 0.065 0.139 0.976
Total: All Students 187,118 1.000
Table 2.17: GRADE 6 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 63,206 0.337 0.306 0.031 0 0 0.907 0.819
Level 2 48,279 0.257 0.034 0.191 0.033 0 0.741
Level 3 51,663 0.275 0 0.034 0.214 0.027 0.778
Level 4 24,424 0.130 0 0 0.022 0.109 0.834
Claim 1 Below 30,778 0.330 0.26 0.065 0.005 0 0.985 0.981
Near 42,702 0.458 0.052 0.183 0.193 0.03
Above 19,775 0.212 0 0.005 0.077 0.13 0.977
Claim 2 Below 20,163 0.441 0.303 0.11 0.027 0.001 0.937 0.936
Near 17,106 0.374 0.038 0.137 0.164 0.035
Above 8,433 0.185 0.001 0.011 0.071 0.102 0.934
Claim 3 Below 21,625 0.232 0.193 0.034 0.005 0 0.979 0.948
Near 54,631 0.586 0.118 0.205 0.2 0.063
Above 17,001 0.182 0.002 0.013 0.07 0.097 0.918
Claim 4 Below 13,809 0.302 0.245 0.051 0.005 0 0.982 0.957
Near 21,613 0.473 0.096 0.193 0.16 0.023
Above 10,280 0.225 0.001 0.014 0.096 0.114 0.932
Total: All Students 187,572 1.000
Table 2.18: GRADE 7 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 61,119 0.327 0.295 0.032 0 0 0.902 0.812
Level 2 48,443 0.259 0.036 0.187 0.036 0 0.721
Level 3 55,702 0.298 0 0.037 0.234 0.026 0.786
Level 4 21,878 0.117 0 0 0.02 0.097 0.833
Claim 1 Below 27,481 0.294 0.239 0.051 0.004 0 0.986 0.981
Near 44,081 0.472 0.059 0.187 0.204 0.022
Above 21,911 0.234 0 0.006 0.099 0.129 0.976
Claim 2 Below 14,325 0.312 0.235 0.066 0.011 0 0.964 0.949
Near 20,950 0.457 0.068 0.176 0.187 0.025
Above 10,606 0.231 0.001 0.014 0.107 0.11 0.934
Claim 3 Below 22,664 0.242 0.195 0.04 0.007 0 0.969 0.96
Near 55,729 0.596 0.103 0.196 0.24 0.057
Above 15,090 0.161 0.001 0.007 0.06 0.094 0.952
Claim 4 Below 11,546 0.252 0.198 0.047 0.007 0 0.973 0.967
Near 24,670 0.538 0.106 0.201 0.207 0.023
Above 9,665 0.211 0.001 0.008 0.091 0.111 0.96
Total: All Students 187,142 1.000
Table 2.19: GRADE 8 ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 27,932 0.291 0.259 0.032 0 0 0.889 0.8
Level 2 25,059 0.261 0.037 0.187 0.038 0 0.716
Level 3 30,068 0.314 0 0.039 0.244 0.03 0.777
Level 4 12,823 0.134 0 0 0.024 0.11 0.822
Claim 1 Below 30,118 0.315 0.249 0.063 0.004 0 0.988 0.982
Near 43,899 0.459 0.046 0.191 0.205 0.018
Above 21,558 0.226 0 0.005 0.098 0.122 0.976
Claim 2 Below 18,153 0.390 0.262 0.105 0.023 0 0.94 0.941
Near 21,140 0.454 0.048 0.162 0.209 0.035
Above 7,287 0.156 0.001 0.009 0.068 0.08 0.941
Claim 3 Below 22,065 0.231 0.181 0.043 0.007 0 0.971 0.953
Near 54,723 0.573 0.112 0.205 0.216 0.041
Above 18,792 0.197 0.001 0.012 0.084 0.1 0.935
Claim 4 Below 10,724 0.230 0.187 0.04 0.003 0 0.985 0.97
Near 26,738 0.574 0.123 0.227 0.201 0.022
Above 9,118 0.196 0 0.008 0.094 0.092 0.954
Total: All Students 95,882 1.000
Table 2.20: HIGH SCHOOL ELA/LITERACY CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 1,400 0.148 0.129 0.019 0 0 0.873 0.792
Level 2 1,950 0.206 0.024 0.15 0.032 0 0.725
Level 3 3,407 0.361 0 0.045 0.271 0.045 0.75
Level 4 2,688 0.285 0 0 0.043 0.242 0.85
Claim 1 Below 1,691 0.179 0.119 0.053 0.007 0 0.961 0.972
Near 4,510 0.478 0.034 0.155 0.237 0.051
Above 3,241 0.343 0 0.006 0.101 0.236 0.984
Claim 2 Below 1,528 0.162 0.112 0.042 0.008 0 0.95 0.961
Near 4,759 0.504 0.042 0.163 0.23 0.069
Above 3,155 0.334 0 0.009 0.107 0.218 0.973
Claim 3 Below 1,243 0.132 0.09 0.034 0.008 0 0.939 0.955
Near 6,017 0.637 0.064 0.173 0.276 0.124
Above 2,182 0.231 0 0.006 0.061 0.163 0.972
Claim 4 Below 1,373 0.145 0.107 0.035 0.004 0 0.975 0.976
Near 4,861 0.515 0.047 0.17 0.236 0.061
Above 3,208 0.340 0 0.008 0.106 0.226 0.977
Total: All Students 9,445 1.000

2.4.2 Mathematics

Results in this section are based on real data from students who took the full blueprint. Table 2.21 through Table 2.27 show the classification accuracy of the mathematics assessment for each grade 3-8 and high school (HS). Section 2.4 explains how the statistics in these tables were computed. Classification accuracy for each category was high to moderately high for all mathematics grades.

Table 2.21: GRADE 3 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 56,619 0.335 0.31 0.025 0 0 0.925 0.852
Level 2 38,293 0.227 0.027 0.173 0.027 0 0.761
Level 3 42,724 0.253 0 0.028 0.203 0.021 0.804
Level 4 31,278 0.185 0 0 0.019 0.166 0.896
Claim 1 Below 35,432 0.385 0.284 0.08 0.018 0.002 0.947 0.934
Near 26,767 0.291 0.037 0.12 0.12 0.014
Above 29,935 0.325 0.006 0.019 0.113 0.187 0.922
Claim 2/4 Below 16,177 0.281 0.226 0.051 0.005 0 0.984 0.981
Near 25,452 0.442 0.058 0.169 0.183 0.033
Above 15,915 0.277 0 0.006 0.079 0.191 0.978
Claim 3 Below 21,724 0.236 0.192 0.039 0.004 0 0.981 0.96
Near 46,324 0.503 0.132 0.166 0.168 0.036
Above 24,086 0.261 0.003 0.013 0.078 0.167 0.938
Total: All Students 168,914 1.000
Table 2.22: GRADE 4 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 55,133 0.298 0.273 0.025 0 0 0.916 0.858
Level 2 53,688 0.290 0.026 0.238 0.026 0 0.821
Level 3 44,717 0.241 0 0.027 0.194 0.02 0.805
Level 4 31,672 0.171 0 0 0.018 0.153 0.896
Claim 1 Below 39,109 0.422 0.275 0.123 0.021 0.004 0.94 0.932
Near 24,514 0.265 0.017 0.126 0.108 0.014
Above 28,985 0.313 0.005 0.019 0.11 0.179 0.923
Claim 2/4 Below 18,518 0.322 0.223 0.092 0.006 0 0.98 0.983
Near 24,811 0.431 0.027 0.181 0.185 0.038
Above 14,197 0.247 0 0.003 0.064 0.179 0.986
Claim 3 Below 19,742 0.213 0.152 0.058 0.003 0 0.983 0.963
Near 50,042 0.540 0.143 0.198 0.164 0.036
Above 22,824 0.246 0.002 0.012 0.073 0.16 0.943
Total: All Students 185,210 1.000
Table 2.23: GRADE 5 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 73,654 0.394 0.363 0.031 0 0 0.922 0.855
Level 2 48,213 0.258 0.028 0.205 0.025 0 0.794
Level 3 31,765 0.170 0 0.024 0.127 0.019 0.748
Level 4 33,238 0.178 0 0 0.018 0.16 0.897
Claim 1 Below 44,957 0.482 0.339 0.116 0.021 0.007 0.942 0.936
Near 24,404 0.262 0.023 0.124 0.09 0.025
Above 23,880 0.256 0.003 0.015 0.065 0.173 0.929
Claim 2/4 Below 20,870 0.361 0.283 0.072 0.005 0 0.985 0.982
Near 25,169 0.435 0.058 0.184 0.139 0.054
Above 11,848 0.205 0 0.004 0.037 0.164 0.979
Claim 3 Below 30,150 0.323 0.255 0.061 0.006 0.001 0.978 0.965
Near 46,325 0.497 0.109 0.185 0.136 0.066
Above 16,766 0.180 0.001 0.008 0.034 0.137 0.952
Total: All Students 186,870 1.000
Table 2.24: GRADE 6 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 73,086 0.391 0.361 0.029 0 0 0.925 0.848
Level 2 52,637 0.281 0.031 0.222 0.028 0 0.79
Level 3 33,688 0.180 0 0.027 0.133 0.02 0.74
Level 4 27,640 0.148 0 0 0.016 0.131 0.889
Claim 1 Below 45,320 0.488 0.344 0.121 0.019 0.004 0.953 0.942
Near 27,451 0.295 0.022 0.138 0.107 0.028
Above 20,156 0.217 0.002 0.013 0.056 0.146 0.932
Claim 2/4 Below 21,390 0.369 0.292 0.072 0.005 0 0.986 0.981
Near 25,697 0.443 0.05 0.196 0.148 0.049
Above 10,925 0.188 0 0.004 0.037 0.146 0.977
Claim 3 Below 25,002 0.269 0.213 0.051 0.005 0.001 0.979 0.966
Near 52,905 0.569 0.155 0.214 0.142 0.058
Above 15,020 0.162 0 0.007 0.035 0.119 0.954
Total: All Students 187,051 1.000
Table 2.25: GRADE 7 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 72,826 0.391 0.357 0.033 0 0 0.915 0.851
Level 2 48,942 0.263 0.031 0.205 0.026 0 0.783
Level 3 36,668 0.197 0 0.024 0.154 0.019 0.783
Level 4 27,967 0.150 0 0 0.016 0.134 0.896
Claim 1 Below 47,449 0.511 0.33 0.143 0.032 0.006 0.925 0.937
Near 23,518 0.253 0.013 0.117 0.105 0.019
Above 21,979 0.236 0.001 0.011 0.069 0.155 0.948
Claim 2/4 Below 19,918 0.343 0.279 0.06 0.004 0 0.987 0.983
Near 25,850 0.445 0.073 0.191 0.149 0.032
Above 12,282 0.212 0 0.004 0.049 0.159 0.979
Claim 3 Below 21,529 0.232 0.185 0.042 0.004 0 0.98 0.97
Near 55,239 0.594 0.158 0.222 0.16 0.054
Above 16,178 0.174 0 0.007 0.042 0.125 0.96
Total: All Students 186,403 1.000
Table 2.26: GRADE 8 MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 37,728 0.397 0.356 0.04 0 0 0.898 0.819
Level 2 24,304 0.256 0.036 0.186 0.033 0 0.727
Level 3 17,332 0.182 0 0.029 0.13 0.023 0.714
Level 4 15,730 0.165 0 0 0.018 0.147 0.889
Claim 1 Below 50,617 0.533 0.368 0.125 0.032 0.008 0.925 0.941
Near 25,586 0.270 0.023 0.122 0.099 0.025
Above 18,732 0.197 0.001 0.008 0.051 0.138 0.957
Claim 2/4 Below 20,250 0.341 0.287 0.05 0.004 0 0.989 0.98
Near 26,737 0.450 0.111 0.184 0.124 0.032
Above 12,422 0.209 0 0.006 0.046 0.157 0.97
Claim 3 Below 24,636 0.260 0.208 0.044 0.006 0.001 0.972 0.959
Near 55,821 0.588 0.183 0.203 0.143 0.058
Above 14,478 0.153 0.001 0.008 0.033 0.111 0.945
Total: All Students 95,094 1.000
Table 2.27: HIGH SCHOOL MATHEMATICS CLASSIFICATION ACCURACY
Score Observed Level N P True L1 True L2 True L3 True L4 Accuracy by Level Accuracy Overall
Overall Level 1 3,307 0.350 0.316 0.033 0 0 0.905 0.831
Level 2 2,660 0.281 0.037 0.209 0.035 0 0.745
Level 3 2,286 0.242 0 0.029 0.193 0.019 0.8
Level 4 1,206 0.127 0 0 0.016 0.112 0.876
Claim 1 Below 4,391 0.464 0.341 0.117 0.006 0 0.986 0.987
Near 2,976 0.315 0.012 0.152 0.145 0.006
Above 2,089 0.221 0 0.003 0.093 0.125 0.988
Claim 2/4 Below 2,591 0.274 0.232 0.04 0.002 0 0.992 0.963
Near 4,625 0.489 0.12 0.217 0.141 0.011
Above 2,240 0.237 0.001 0.015 0.101 0.12 0.934
Claim 3 Below 2,988 0.316 0.249 0.061 0.005 0 0.984 0.977
Near 4,854 0.513 0.104 0.205 0.179 0.026
Above 1,614 0.171 0 0.005 0.06 0.105 0.97
Total: All Students 9,459 1.000

2.5 Standard Errors of Measurement (SEMs)

The standard error of measurement (SEM) information in this section is based on student scores and associated SEMs included in the data Smarter Balanced received from members after the 2021-22 administration. Student scores and SEMs are not computed directly by Smarter Balanced. They are computed by service providers who deliver the test according to the scoring specifications provided by Smarter Balanced. These include the use of Equation (2.7) in this chapter for computing SEMs. According to this equation, and the adaptive nature of the test, different students receive different items. The amount of measurement error will therefore vary from student to student, even among students with the same estimate of achievement.

All of the SEM statistics reported in this chapter are based on the full blueprint and are in the reporting scale metric. For member data that includes SEMs in the theta metric exclusively, the SEMs are transformed to the reporting metric using the multiplication factors in the theta-to-scale-score transformation given in Chapter 5. Please remember that ELA/literacy and mathematics are not in the same metric and keep in mind that due to different schools’ responses to post-pandemic test administration in 2022-23, data used for SEMs analyses should not be considered as representative of the whole population.

Table 2.28 and Table 2.29 show the trend in the SEM by student decile for ELA/literacy and mathematics, respectively. Deciles were defined by ranking students from highest to lowest scale score and dividing the students into 10 equal-sized groups according to rank. Decile 1 contains the 10% of students with the lowest scale scores. Decile 10 contains the 10% of students with the highest scale scores. The SEM reported for a decile is the average SEM among examinees at that decile.

Table 2.28: MEAN OVERALL SEM AND CONDITIONAL SEMS BY DECILE, ELA/LITERACY
Grade Mean d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
3 23.0 28.9 22.9 22.2 21.9 21.9 21.8 21.8 22.0 22.4 24.2
4 24.8 30.1 25.0 23.8 23.7 23.7 23.5 23.4 23.4 23.7 26.8
5 24.3 28.9 23.3 22.3 22.4 22.7 22.9 23.3 24.4 25.2 27.4
6 25.6 31.5 25.9 24.0 23.7 24.0 24.3 24.5 24.4 25.2 27.9
7 27.8 36.2 28.8 26.8 25.9 25.6 25.8 26.3 26.7 27.1 28.4
8 30.4 40.0 30.6 28.9 28.5 28.6 28.6 28.4 28.6 29.3 31.5
HS 32.9 40.1 32.5 31.3 31.0 30.9 31.0 31.2 31.8 32.5 34.5


Table 2.29: MEAN OVERALL SEM AND CONDITIONAL SEMS BY DECILE, MATHEMATICS
Grade Mean d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
3 18.0 26.0 19.6 18.0 17.0 16.6 16.3 16.1 16.0 16.1 17.8
4 18.1 27.8 20.5 18.4 17.0 16.5 16.3 16.0 15.5 15.4 16.6
5 21.4 34.8 27.4 24.5 22.1 20.1 18.4 17.2 16.1 16.1 17.2
6 23.9 40.9 29.4 25.6 23.8 21.9 20.8 20.2 19.3 18.2 18.6
7 25.5 44.7 32.9 28.6 26.3 24.1 22.3 20.9 18.8 18.0 18.1
8 30.3 46.4 37.6 34.3 32.0 29.7 27.8 26.0 24.2 22.5 22.6
HS 29.0 45.6 36.1 32.2 29.7 28.1 26.5 24.8 22.9 21.7 21.6

Table 2.30 and Table 2.31 show the average SEM near the achievement level cut scores. In the table, M is Mean and SD is Standard Deviation.

The average SEM reported for a given cut score is the average SEM among students within 10 scale score units of the cut score. In the column headings, “Cut1” is the lowest cut score defining the lower boundary of level 2, “Cut2” defines the lower boundary of level 3, and “Cut3” defines the lower boundary of level 4.

Table 2.30: CONDITIONAL SEM NEAR (±10 POINTS) ACHIEVEMENT LEVEL CUT SCORES, ELA/LITERACY
Grade Cut1v2_N Cut1v2_M Cut1v2_SD Cut2v3_N Cut2v3_M Cut2v3_SD Cut3v4_N Cut3v4_M Cut3v4_SD
3 12617 21.9 2.62 13253 21.8 2.62 11211 22.1 2.97
4 12942 23.7 3.31 13951 23.5 3.45 13359 23.4 3.33
5 13003 22.4 3.13 12944 22.9 3.31 12101 24.8 3.03
6 13507 23.6 2.96 13587 24.5 3.54 9738 25.5 3.83
7 12772 26.0 2.92 14175 26.0 3.07 8115 27.2 2.79
8 6062 28.6 3.23 6873 28.6 2.52 4915 29.5 2.27
HS 308 32.3 1.93 625 31.0 1.59 714 31.6 1.92


Table 2.31: CONDITIONAL SEM NEAR (±10 POINTS) OF ACHIEVEMENT LEVEL CUT SCORES, MATHEMATICS
Grade Cut1v2_N Cut1v2_M Cut1v2_SD Cut2v3_N Cut2v3_M Cut2v3_SD Cut3v4_N Cut3v4_M Cut3v4_SD
3 13725 17.1 2.01 15012 16.2 1.56 11395 16.0 1.40
4 14131 17.5 2.16 16537 16.2 1.26 11951 15.3 1.72
5 13465 21.3 2.64 14162 17.1 2.46 11508 15.9 2.23
6 13042 23.0 2.64 13606 20.1 1.86 9831 18.1 2.60
7 12525 25.6 4.19 11753 20.8 2.97 9331 17.9 2.52
8 6194 30.8 5.07 6101 25.9 3.40 4501 22.6 2.94
HS 608 29.6 1.67 626 25.1 1.77 402 21.4 1.73

Figure 2.2 to Figure 2.15 are scatter plots of a random sample of 2,000 individual student SEMs as a function of scale score for the total test and claims/subscores by grade within subject. These plots show the variability of SEMs among students with the same scale score as well as the trend in SEM with student achievement (scale score). In comparison to the total score, a claim score has greater measurement error and variability among students due to the fact that the claim score is based on a smaller number of items. Among claims, those representing fewer items will have higher measurement error and greater variability of measurement error than those representing more items.

Dashed vertical lines in Figure 2.2 to Figure 2.15 represent the achievement level cut scores. The plots for the high school standard errors show cut scores for each grade 9, 10, and 11, separately.

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 3

Figure 2.2: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 3

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 4

Figure 2.3: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 4

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 5

Figure 2.4: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 5

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 6

Figure 2.5: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 6

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 7

Figure 2.6: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 7

Students' Standard Error of Measurement by Scale Score, ELA/Literacy Grade 8

Figure 2.7: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 8

Students' Standard Error of Measurement by Scale Score, ELA/Literacy High School

Figure 2.8: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy High School


Students' Standard Error of Measurement by Scale Score, Mathematics Grade 3

Figure 2.9: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 3

Students' Standard Error of Measurement by Scale Score, Mathematics Grade 4

Figure 2.10: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 4

Students' Standard Error of Measurement by Scale Score, Mathematics Grade 5

Figure 2.11: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 5

Students' Standard Error of Measurement by Scale Score, Mathematics Grade 6

Figure 2.12: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 6

Students' Standard Error of Measurement by Scale Score, Mathematics Grade 7

Figure 2.13: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 7

Students' Standard Error of Measurement by Scale Score, Mathematics Grade 8

Figure 2.14: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 8

Students' Standard Error of Measurement by Scale Score, Mathematics High School

Figure 2.15: Students’ Standard Error of Measurement by Scale Score, Mathematics High School

All of the tables and figures in this section, for every grade and subject, show a trend of higher measurement error for lower-achieving students. This trend reflects the fact that the item pool is difficult in comparison to overall student achievement. The computer adaptive test (CAT) algorithm still delivers easier items to lower-achieving students than they would typically receive in a non-adaptive test, or in a fixed form where difficulty is similar to that of the item pool as a whole. But low-achieving students still tend to receive items that are relatively more difficult for them. Typically, this is because the CAT algorithm does not have easier items available within the blueprint constraints that must be met for all students.

References

Smarter Balanced. (2023b). Smarter Balanced Scoring Specifications for Summative and Interim Assessments. Retrieved from https://technicalreports.smarterbalanced.org/scoring_specs/_book/scoringspecs.html.