Chapter 2 Reliability, Precision, and Errors of Measurement

2.1 Introduction

This chapter addresses the technical quality of operational test functioning with regard to precision and reliability. Part of the test validity argument is that scores must be consistent and precise enough to be useful for intended purposes. If scores are to be meaningful, tests should deliver the same results under repeated administrations to the same student or for students of the same ability. In addition, the range of certainty around the score should be small enough to support educational decisions. The reliability and precision of a test are examined through analysis of measurement error and other test properties in simulated and operational conditions. For example, the reliability of a test may be assessed in part by verifying that different test forms follow the same blueprint. In computer adaptive testing (CAT), one cannot expect the same set of items to be administered to the same examinee more than once. Consequently, reliability is inferred from internal test properties, including test length and the information provided by item parameters. Measurement precision is enhanced when the student receives items that are well matched, in terms of difficulty, to the overall performance level of the student. Measurement precision is also enhanced when the items a student receives work well together to measure the same general body of knowledge, skills, and abilities defined by the test blueprint. Smarter Balanced uses an adaptive model because adaptive tests are customized to each student in terms of the difficulty of the items. Smarter Balanced used item quality control procedures that ensure test items measure the knowledge, skills, and abilities specified in the test blueprint and work well together in this respect. The expected outcome of these and other test administration and item quality control procedures is high reliability and low measurement error.

For the 2022-23 administration, all statistics in this chapter are based on the full blueprint. Measurement bias from the simulation results produced by Cambium Assessment are provided, along with reliability, classification accuracy, and standard errors of measurement based on student data provided by Michigan, Montana, Nevada, and South Dakota. Statistics about the paper/pencil forms are based on the items on the forms, not the students who took the assessment in 2022-23.

2.2 Measurement Bias

Measurement bias is any systematic or non-random error that occurs in estimating a student’s achievement from the student’s scores on test items. Prior to the release of the 2022-23 item pool, simulation studies were carried out to ensure that the item pool, combined with the adaptive test administration algorithm, would produce satisfactory tests with regard to measurement bias and random measurement error as a function of student achievement, overall reliability, fulfillment of test blueprints, and item exposure.

Results for measurement bias with the full blueprint are provided in this section. Measurement bias is the one index of test performance that is clearly and preferentially assessed through simulation as opposed to the use of real data. With real data, true student achievement is unknown. In simulation, true student achievement can be assumed and used to generate item responses. The simulated item responses are used in turn to estimate achievement. Achievement estimates are then compared to the underlying assumed, true values of student achievement to assess whether the estimates contain systematic error (bias).

Simulations for the 2022-23 administration were carried out by Cambium Assessment. The simulations were performed for each grade within a subject area for the standard item pool (English) and for accommodation item pools of braille and Spanish for mathematics and braille for ELA/literacy. For the standard item pools, the number of simulees was 3,000 for grades 3-8 and 5,000 for grade 11. For the braille and Spanish pools, the number of simulees was 1,000 for grades 3-8 and 2,000 for grade 11. True student achievement values were sampled from a normal distribution for each grade and subject. The parameters for the normal distribution were based on students’ operational scores on the 2018–2019 Smarter Balanced summative tests.

Test events were created for the simulated examinees using the 2022-23 item pool. Estimated ability ( \(\hat{\theta}\) ) was calculated from the simulated tests using maximum likelihood estimation (MLE) as described in the Smarter Balanced Test Scoring Specifications (Smarter Balanced, 2023b).

Bias was computed as:

\[\begin{equation} bias = N^{-1}\sum_{i = 1}^{N} (\theta_{i} - \hat{\theta}_{i}) \tag{2.1} \end{equation}\]

and the error variance of the estimated bias is:

\[\begin{equation} ErrorVar(bias) = \frac{1}{N(N-1)}\sum_{i = 1}^{N} (\theta_{i} - \hat{\theta}_{i}-mean(\theta_{i}-\hat{\theta}_{i}))^{2} \tag{2.2} \end{equation}\]

where \(\theta_{i} - \hat{\theta}\) is the deviation score, and \(N\) denotes the number of simulees (\(N = 1000\) for all conditions). Statistical significance of the bias is tested using a z-test: \[\begin{equation} z = \frac{bias}{\sqrt{ErrorVar(bias)}} \tag{2.3} \end{equation}\]

Table 2.1 and Table 2.2 show for ELA/literacy and mathematics, respectively, the bias in estimates of student achievement based on the complete test assembled from the standard item pool and the accommodations pools included in the simulations. The standard error of bias is the denominator of the z-score in Equation (2.3). The p-value is the probability \(|Z| > |z|\) where \(Z\) is a standard normal variate and \(|z|\) is the absolute value of the \(z\) computed in Equation (2.3). Under the hypothesis of no bias, approximately 5% and 1% of the \(\theta_{i}\) will fall outside, respectively, 95% and 99% confidence intervals centered on \(\theta_{i}\).

Mean bias was generally very small in practical terms, exceeding .02 in absolute value in no cases for ELA/literacy and in only six cases for mathematics. Mean bias tended to be statistically significantly different from 0, but this was due to the large sample sizes used for the simulation. In virtually all cases, the percentage of simulated examinees whose estimated achievement score fell outside the confidence intervals centered on their true score was close to expected values of 5% for the 95% confidence interval and 1% for the 99% confidence interval. Plots of bias by estimated theta in the full simulation report show that positive and statistically significant mean bias was due to thetas being underestimated in regions of student achievement far below the lowest cut score (separating achievement levels 1 and 2). The same plots show that estimation bias is negligible near all cut scores in all cases.

Table 2.1: BIAS OF THE ESTIMATED PROFICIENCIES: ENGLISH LANGUAGE ARTS/LITERACY
Pool	Grade	Mean Bias	SE (Bias)	P value	MSE	95% CI Miss Rate	99% CI Miss Rate
Standard	3	0.00	0.01	0.51	0.11	4.90%	1.03%
	4	0.00	0.01	0.68	0.13	4.97%	0.87%
	5	-0.01	0.01	0.43	0.14	5.63%	0.70%
	6	0.00	0.01	0.73	0.13	5.20%	1.17%
	7	-0.01	0.01	0.41	0.14	4.73%	0.83%
	8	-0.01	0.01	0.23	0.17	5.34%	0.77%
	HS	0.00	0.01	0.58	0.19	5.26%	1.08%
Braille	3	0.02	0.01	0.11	0.12	5.40%	0.90%
	4	-0.01	0.01	0.35	0.12	5.21%	0.80%
	5	0.01	0.01	0.48	0.12	4.50%	0.90%
	6	-0.02	0.01	0.07	0.14	5.80%	1.00%
	7	0.00	0.01	0.82	0.15	4.50%	1.10%
	8	-0.02	0.01	0.21	0.18	5.60%	1.10%
	HS	0.00	0.01	0.99	0.19	4.40%	0.70%

Table 2.2: BIAS OF THE ESTIMATED PROFICIENCIES: MATHEMATICS
Pool	Grade	Mean Bias	SE (Bias)	P value	MSE	95% CI Miss Rate	99% CI Miss Rate
Standard	3	0.00	0.00	0.41	0.07	4.63%	1.07%
	4	0.00	0.00	0.6	0.07	4.97%	0.93%
	5	0.01	0.01	0.01	0.09	4.33%	0.67%
	6	0.00	0.01	0.53	0.12	4.27%	0.93%
	7	0.02	0.01	0.01	0.13	4.81%	0.83%
	8	0.02	0.01	0.01	0.18	4.90%	0.90%
	HS	0.04	0.01	< 0.005	0.20	4.62%	0.86%
Braille	3	0.00	0.01	0.85	0.08	4.50%	0.90%
	4	0.00	0.01	0.72	0.09	4.71%	1.20%
	5	0.02	0.01	0.15	0.12	4.90%	1.00%
	6	0.00	0.01	0.77	0.17	4.81%	0.70%
	7	0.01	0.01	0.27	0.15	4.21%	0.80%
	8	0.01	0.01	0.63	0.19	5.01%	1.30%
	HS	0.03	0.01	0.01	0.21	4.75%	0.85%
Spanish	3	-0.01	0.01	0.54	0.07	5.20%	0.80%
	4	0.00	0.01	0.73	0.08	4.70%	0.70%
	5	0.03	0.01	0.01	0.12	3.81%	0.60%
	6	0.02	0.01	0.1	0.14	4.41%	0.70%
	7	0.01	0.01	0.3	0.14	5.61%	1.00%
	8	0.02	0.01	0.22	0.19	4.80%	1.00%
	HS	0.03	0.01	< 0.005	0.20	5.00%	1.00%

2.3 Reliability

Reliability estimates reported in this section are derived from internal, IRT-based estimates of the measurement error in the test scores of examinees (MSE) and the observed variance of examinees’ test scores on the \(\theta\)-scale \((var(\hat{\theta}))\). The formula for the reliability estimate (\(\rho\)) is:

\[\begin{equation} \hat{\rho} = 1 - \frac{MSE}{var(\hat{\theta})}. \tag{2.4} \end{equation}\]

According to Smarter Balanced Test Scoring Specifications (Smarter Balanced, 2023b), estimates of measurement error are obtained from the parameter estimates of the items taken by the examinees. This is done by computing the test information for each examinee \(i\) as:

\[\begin{equation} I({\theta}_{j}) = \sum_{i=1} 1.7^2 a_i^2 \left\{\sum_{v=0}^{m_i-1} v^2 p_{iv}(z_{ij} = v) - \left[\sum_{v=0}^{m_i-1} v p_{iv}(z_{ij} = v)\right]^2 \right\}. \tag{2.5} \end{equation}\]

where \(p_{iv}(z_{ij} = v)\) is the probability of responding in scoring category \(v\) where \(v=0,1,...,m_i-1\). When the item is based on the 2PL model, \(m_i=2\) and Equation (2.5) simplifies to

\[\begin{equation} I({\theta}_{j}) = \sum_{i=1} 1.7^2 a_i^2 \left[p_{i}(z_{ij} = 1) - p_{i}(z_{ij} = 1)^2 \right], \tag{2.6} \end{equation}\]

The test information is computed using only the items answered by the examinee. The measurement error (SEM) for examinee \(i\) is then computed as:

\[\begin{equation} SEM(\hat{\theta_i}) = \frac{1}{\sqrt{I(\hat{\theta_i})}}. \tag{2.7} \end{equation}\]

The upper bound of \(SEM(\hat{\theta_i})\) is set to 2.5. Any value larger than 2.5 is truncated at 2.5. The mean squared error for a group of \(N\) examinees is then:

\[\begin{equation} MSE = N^{-1}\sum_{i=1}^N SEM(\hat{\theta_i})^2 \tag{2.8} \end{equation}\]

and the variance of the achievement scores is: \[\begin{equation} var(\hat{\theta}) = N^{-1}\sum_{i=1}^N SEM(\hat{\theta_i} - \overline{\hat{\theta}})^2 \tag{2.9} \end{equation}\]

where \(\overline{\hat{\theta}}\) is the average of the \(\hat{\theta_i}\).

The measurement error for a group of examinees is typically reported as the square root of \(MSE\) and is denoted \(RMSE\). Measurement error is computed with Equation (2.7) and Equation (2.8) on a scale where achievement has a standard deviation close to 1 among students at a given grade. Measurement error reported in the tables of this section is transformed to the reporting scale by multiplying the theta-scale measurement error by \(a\), where \(a\) is the slope used to convert estimates of student achievement on the \(\theta\)-scale to the reporting scale. The transformation equations for converting estimates of student achievement on the \(\theta\)-scale to the reporting scale are given in Chapter 5.

2.3.1 General Population

Reliability estimates in this section are based on real data and the full blueprint. In mathematics, claims 2 and 4 are reported together as a single subscore, so there are only three reporting categories for mathematics, but four claims. Table 2.3 and Table 2.4 show the reliability of the observed total scores and subscores for ELA/literacy and mathematics. Reliability estimates are high for the total score in both subjects. Reliability coefficients are high for the claim 1 score in mathematics, moderately high for the claim 1 and claim 2 scores in ELA/literacy, and moderately high to moderate for the remainder of the claim scores in both subjects. The lowest reliability coefficient in either subject is .554, which is the reliability of the claim 3 score in the grade 8 mathematics assessment.

Table 2.3: ELA/LITERACY SUMMATIVE SCALE SCORE MARGINAL RELIABILITY ESTIMATES
Grade	N	Total score	Claim 1	Claim 2	Claim 3	Claim 4
3	168,997	0.932	0.791	0.712	0.597	0.733
4	185,450	0.930	0.791	0.718	0.609	0.686
5	187,118	0.935	0.801	0.741	0.624	0.754
6	187,572	0.925	0.780	0.753	0.599	0.706
7	187,142	0.919	0.770	0.755	0.620	0.681
8	95,882	0.911	0.784	0.709	0.623	0.688
HS	9,445	0.904	0.753	0.737	0.588	0.693

Table 2.4: MATHEMATICS SUMMATIVE SCALE SCORE MARGINAL RELIABILITY ESTIMATES
Grade	N	Total score	Claim 1	Claim 2/4	Claim 3
3	168,914	0.956	0.925	0.705	0.753
4	185,210	0.956	0.924	0.754	0.746
5	186,870	0.944	0.905	0.660	0.726
6	187,051	0.943	0.906	0.717	0.642
7	186,403	0.937	0.897	0.654	0.693
8	95,094	0.922	0.893	0.617	0.624
HS	9,459	0.930	0.889	0.702	0.695

2.3.2 Demographic Groups

Reliability estimates in this section are based on real data and the full blueprint. During the 2021-22 administration year, most students and schools are back to normal testing. However, schools responded differently to post-pandemic test administration. Many states maintained remote testing and some states switched to adjusted blueprint. consequently, demographic groups results presented below should not be considered representative of the entire student population. Table 2.5 and Table 2.6 show the reliability of the test for students of different racial groups in ELA/literacy and mathematics who tested in 2022-23. Table 2.7 and Table 2.8 show the reliability of the test for students who tested in 2022-23, grouped by demographics typically requiring accommodations or accessibility tools. The groups include English learners (EL) and students falling under Individuals with Disabilities Education Act (IDEA).

Because of the differences in average score across demographic groups and the relationship between measurement error and student achievement scores, which will be seen in the next section of this chapter, demographic groups with lower average scores tend to have lower reliability than the population as a whole. Nevertheless, the reliability coefficients for all demographic groups in these tables are moderately high to high.

Table 2.5: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY ETHNIC GROUP - ELA/LITERACY
Grade	Group	N	Var	MSE	Rho
3	Total	168,997	8100	551	0.932
	American Indian or Alaska Native	4,421	6834	714	0.896
	Asian	7,089	8161	570	0.930
	Black/African American	26,487	6665	570	0.914
	Native Hawaiian or Pacific Islander	945	7775	611	0.921
	Hispanic/Latino Ethnicity	33,839	7327	602	0.918
	White	101,528	7653	554	0.928
4	Total	185,450	9030	635	0.930
	American Indian or Alaska Native	4,240	8388	842	0.900
	Asian	7,990	8944	673	0.925
	Black/African American	28,828	7544	651	0.914
	Native Hawaiian or Pacific Islander	967	8862	709	0.920
	Hispanic/Latino Ethnicity	35,790	8340	695	0.917
	White	112,690	8386	639	0.924
5	Total	187,118	9479	613	0.935
	American Indian or Alaska Native	4,130	8979	825	0.908
	Asian	8,140	8783	670	0.924
	Black/African American	28,840	7751	605	0.922
	Native Hawaiian or Pacific Islander	1,009	9087	671	0.926
	Hispanic/Latino Ethnicity	36,113	8770	653	0.926
	White	113,733	8989	626	0.930
6	Total	187,572	9090	681	0.925
	American Indian or Alaska Native	4,171	7896	926	0.883
	Asian	7,808	8570	718	0.916
	Black/African American	29,171	7338	694	0.905
	Native Hawaiian or Pacific Islander	950	9095	804	0.912
	Hispanic/Latino Ethnicity	36,610	8425	748	0.911
	White	114,104	8600	680	0.921
7	Total	187,142	9985	804	0.919
	American Indian or Alaska Native	4,062	9645	1173	0.878
	Asian	7,904	9290	792	0.915
	Black/African American	28,821	8526	857	0.900
	Native Hawaiian or Pacific Islander	921	9512	844	0.911
	Hispanic/Latino Ethnicity	36,429	9595	875	0.909
	White	114,492	9526	807	0.915
8	Total	95,882	10902	969	0.911
	American Indian or Alaska Native	3,560	9778	1357	0.861
	Asian	4,916	10132	917	0.910
	Black/African American	12,171	9869	1085	0.890
	Native Hawaiian or Pacific Islander	925	9710	900	0.907
	Hispanic/Latino Ethnicity	29,155	9888	963	0.903
	White	55,677	10592	1003	0.905
HS	Total	9,445	11515	1102	0.904
	American Indian or Alaska Native	740	10538	1218	0.884
	Asian	161	12832	1108	0.914
	Black/African American	301	12575	1141	0.909
	Native Hawaiian or Pacific Islander	14	14044	1223	0.913
	Hispanic/Latino Ethnicity	621	13621	1158	0.915
	White	7,173	10308	1085	0.895

Table 2.6: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY ETHNIC GROUP - MATHEMATICS
Grade	Group	N	Var	MSE	Rho
3	Total	168,914	7885	346	0.956
	American Indian or Alaska Native	4,406	6437	462	0.928
	Asian	7,096	7744	341	0.956
	Black/African American	26,466	6837	417	0.939
	Native Hawaiian or Pacific Islander	940	7344	375	0.949
	Hispanic/Latino Ethnicity	33,828	6921	366	0.947
	White	101,456	6985	329	0.953
4	Total	185,210	7956	353	0.956
	American Indian or Alaska Native	4,233	7261	496	0.932
	Asian	7,991	8017	346	0.957
	Black/African American	28,732	6702	460	0.931
	Native Hawaiian or Pacific Islander	968	8033	366	0.954
	Hispanic/Latino Ethnicity	35,744	7319	382	0.948
	White	112,596	6836	323	0.953
5	Total	186,870	8995	508	0.944
	American Indian or Alaska Native	4,102	7708	738	0.904
	Asian	8,143	8670	411	0.953
	Black/African American	28,765	7028	720	0.898
	Native Hawaiian or Pacific Islander	1,006	8242	522	0.937
	Hispanic/Latino Ethnicity	36,058	7911	572	0.928
	White	113,617	8048	453	0.944
6	Total	187,051	11191	643	0.943
	American Indian or Alaska Native	4,148	10309	1104	0.893
	Asian	7,804	10992	499	0.955
	Black/African American	29,058	9649	901	0.907
	Native Hawaiian or Pacific Islander	950	10336	690	0.933
	Hispanic/Latino Ethnicity	36,437	9765	757	0.922
	White	113,846	9937	574	0.942
7	Total	186,403	11924	746	0.937
	American Indian or Alaska Native	4,027	9797	1196	0.878
	Asian	7,894	12268	537	0.956
	Black/African American	28,635	9486	1073	0.887
	Native Hawaiian or Pacific Islander	907	10054	814	0.919
	Hispanic/Latino Ethnicity	36,182	10213	901	0.912
	White	114,175	11012	656	0.940
8	Total	95,094	12767	1002	0.922
	American Indian or Alaska Native	3,494	10409	1503	0.856
	Asian	4,912	13105	731	0.944
	Black/African American	11,977	10513	1271	0.879
	Native Hawaiian or Pacific Islander	915	9991	1043	0.896
	Hispanic/Latino Ethnicity	28,846	10518	1180	0.888
	White	55,338	12886	923	0.928
HS	Total	9,459	12886	905	0.930
	American Indian or Alaska Native	744	8391	1501	0.821
	Asian	163	16476	872	0.947
	Black/African American	304	12664	1163	0.908
	Native Hawaiian or Pacific Islander	15	13074	1368	0.895
	Hispanic/Latino Ethnicity	628	12023	1119	0.907
	White	7,172	11495	811	0.929

Table 2.7: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY GROUP - ELA/LITERACY
Grade	Group	N	Var	MSE	Rho
3	Total	168,997	8100	551	0.932
	EL Status	12,620	6210	686	0.890
	IDEA Indicator	13,996	6999	748	0.893
	Section 504 Status	2,432	8083	703	0.913
	Economic Disadvantage Status	41,021	7197	652	0.909
4	Total	185,450	9030	635	0.930
	EL Status	11,764	6755	803	0.881
	IDEA Indicator	14,018	8020	851	0.894
	Section 504 Status	2,983	9113	858	0.906
	Economic Disadvantage Status	41,021	8175	763	0.907
5	Total	187,118	9479	613	0.935
	EL Status	10,250	6074	763	0.874
	IDEA Indicator	13,885	8248	819	0.901
	Section 504 Status	3,414	10160	845	0.917
	Economic Disadvantage Status	40,723	8835	726	0.918
6	Total	187,572	9090	681	0.925
	EL Status	9,015	5106	909	0.822
	IDEA Indicator	13,230	6919	937	0.865
	Section 504 Status	5,313	10636	895	0.916
	Economic Disadvantage Status	39,332	8407	838	0.900
7	Total	187,142	9985	804	0.919
	EL Status	8,432	6057	1106	0.817
	IDEA Indicator	13,052	8333	1147	0.862
	Section 504 Status	5,761	11531	1003	0.913
	Economic Disadvantage Status	38,936	9621	967	0.900
8	Total	95,882	10902	969	0.911
	EL Status	8,095	5598	1180	0.789
	IDEA Indicator	12,606	7946	1209	0.848
	Section 504 Status	6,033	11970	1032	0.914
	Economic Disadvantage Status	39,330	9626	1015	0.895
HS	Total	9,445	11515	1102	0.904
	EL Status	230	6936	1373	0.802
	IDEA Indicator	859	8561	1297	0.849
	Section 504 Status	470	12142	1100	0.909
	Economic Disadvantage Status	2,809	11192	1111	0.901

Table 2.8: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY GROUP - MATHEMATICS
Grade	Group	N	Var	MSE	Rho
3	Total	168,914	7885	346	0.956
	EL Status	12,655	6179	403	0.935
	IDEA Indicator	14,010	7651	490	0.936
	Section 504 Status	2,459	7554	404	0.947
	Economic Disadvantage Status	40,950	6872	384	0.944
4	Total	185,210	7956	353	0.956
	EL Status	11,770	6354	451	0.929
	IDEA Indicator	13,980	8065	542	0.933
	Section 504 Status	3,001	7476	380	0.949
	Economic Disadvantage Status	40,939	7551	406	0.946
5	Total	186,870	8995	508	0.944
	EL Status	10,251	5689	736	0.871
	IDEA Indicator	13,839	7745	814	0.895
	Section 504 Status	3,433	8354	556	0.933
	Economic Disadvantage Status	40,574	7958	610	0.923
6	Total	187,051	11191	643	0.943
	EL Status	8,997	6617	1121	0.831
	IDEA Indicator	13,063	9540	1207	0.873
	Section 504 Status	3,747	10184	690	0.932
	Economic Disadvantage Status	39,075	9742	815	0.916
7	Total	186,403	11924	746	0.937
	EL Status	8,384	6261	1308	0.791
	IDEA Indicator	12,872	8829	1397	0.842
	Section 504 Status	4,015	11180	770	0.931
	Economic Disadvantage Status	38,519	9973	980	0.902
8	Total	95,094	12767	1002	0.922
	EL Status	8,008	6015	1653	0.725
	IDEA Indicator	12,374	9037	1566	0.827
	Section 504 Status	4,364	12617	964	0.924
	Economic Disadvantage Status	38,802	10521	1205	0.885
HS	Total	9,459	12886	905	0.930
	EL Status	245	5654	1607	0.716
	IDEA Indicator	864	8360	1522	0.818
	Section 504 Status	466	12789	950	0.926
	Economic Disadvantage Status	2,809	11140	1077	0.903

2.3.3 Paper/Pencil Tests

Smarter Balanced supports fixed-form paper/pencil tests adherent to the full blueprint for use in a variety of situations, including schools that lack computer capacity and to address potential religious concerns associated with using technology for assessments. Scores on the paper/pencil tests are on the same reporting scale that is used for the online assessments. The forms used in the 2022-23 administration are collectively (for all grades) referred to as Form 5.

Table 2.9 and Table 2.10 show, for ELA/literacy and mathematics, respectively, statistical information pertaining to the items on Form 6 and to the measurement precision of the form. MSE estimates for the paper/pencil forms were based on Equation (??) through Equation (2.8), except that quadrature points and weights over a hypothetical theta distribution were used instead of observed scores (theta_hats). The hypothetical true score distribution used for quadrature was the student distribution from the 2021-22 operational administration. Reliability was then computed as in Equation (2.4) with the observed-score variance equal to the MSE plus the variance of the hypothetical true score distribution. Reliability was better for the full test than for subscales and is inversely related to the SEM.

Table 2.9: RELIABILITY OF PAPER PENCIL TESTS, FORM 5 ENGLISH LANGUAGE ARTS/LITERACY
Grade	Nitems	Rho	SEM	Avg. b	Avg. a	C1 Rho	C1 SEM	C2 Rho	C2 SEM	C3 Rho	C3 SEM	C4 Rho	C4 SEM
3	39	0.915	0.357	-1.134	0.670	0.767	0.643	0.755	0.667	0.642	0.873	0.708	0.751
4	39	0.917	0.360	-0.640	0.678	0.799	0.599	0.726	0.735	0.666	0.846	0.691	0.800
5	39	0.920	0.360	-0.174	0.663	0.782	0.648	0.766	0.678	0.664	0.871	0.717	0.770
6	39	0.904	0.390	0.361	0.555	0.772	0.650	0.746	0.699	0.556	1.069	0.594	0.989
7	39	0.914	0.392	0.868	0.564	0.782	0.677	0.761	0.717	0.623	0.996	0.670	0.899
8	42	0.923	0.376	1.050	0.597	0.786	0.682	0.765	0.723	0.693	0.868	0.687	0.881
11	40	0.930	0.407	1.008	0.580	0.815	0.704	0.781	0.783	0.680	1.014	0.738	0.882

Table 2.10: RELIABILITY OF PAPER PENCIL TEST, FORM 5 MATHEMATICS
Grade	Nitems	Rho	SEM	Avg. b	Avg. a	C1 Rho	C1 SEM	C2&4 Rho	C2&4 SEM	C3 Rho	C3 SEM
3	36	0.913	0.370	-0.935	0.699	0.822	0.558	0.698	0.787	0.751	0.689
4	38	0.917	0.363	-0.305	0.768	0.844	0.519	0.720	0.751	0.694	0.800
5	39	0.916	0.389	-0.131	0.644	0.836	0.569	0.737	0.769	0.711	0.821
6	39	0.909	0.459	0.533	0.646	0.810	0.704	0.751	0.837	0.699	0.953
7	40	0.914	0.464	0.699	0.631	0.830	0.686	0.738	0.903	0.719	0.949
8	39	0.898	0.555	1.016	0.536	0.813	0.789	0.604	1.332	0.701	1.076
10	41	0.905	0.566	1.397	0.533	0.833	0.782	0.719	1.095	0.657	1.264
HS	42	0.887	0.626	1.805	0.490	0.816	0.831	0.617	1.378	0.616	1.381

2.4 Classification Accuracy

Information on classification accuracy is based on actual test results from the 2022-23 administration. Classification accuracy is a measure of how accurately test scores or subscores place students into reporting category levels. The likelihood of inaccurate placement depends on the amount of measurement error associated with scores, especially those nearest cut points, and on the distribution of student achievement. For this report, classification accuracy was calculated in the following manner. For each examinee, analysts used the estimated scale score and its standard error of measurement to obtain a normal approximation of the likelihood function over the range of scale scores. The normal approximation took the scale score estimate as its mean and the standard error of measurement as its standard deviation. The proportion of the area under the curve within each level was then calculated.

Figure 2.1 illustrates the approach for one examinee in grade 11 mathematics. In this example, the examinee’s overall scale score is 2606 (placing this student in level 2, based on the cut scores for this grade level), with a standard error of measurement of 31 points. Accordingly, a normal distribution with a mean of 2606 and a standard deviation of 31 was used to approximate the likelihood of the examinee’s true level, based on the observed test performance. The area under the curve was computed within each score range in order to estimate the probability that the examinee’s true score falls within that level (the red vertical lines identify the cut scores). For the student in Figure 2.1, the estimated probabilities were 2.1% for level 1, 74.0% for level 2, 23.9% for level 3, and 0.0% for level 4. Since the student’s assigned level was level 2, there is an estimated 74% chance the student was correctly classified and a 26% (2.1% + 23.9% + 0.0%) chance the student was misclassified.

Figure 2.1: Illustrative Example of a Normal Distribution Used to Calculate Classification Accuracy

The same procedure was then applied to all students within the sample. Results are shown for 10 cases in Table 2.11 (student 6 is the case illustrated in Figure 2.1).

Table 2.11: ILLUSTRATIVE EXAMPLE OF CLASSIFICATION ACCURACY CALCULATION RESULTS
Student	SS	SEM	Level	P(L1)	P(L2)	P(L3)	P(L4)
1	2751	23	4	0.000	0.000	0.076	0.924
2	2375	66	1	0.995	0.005	0.000	0.000
3	2482	42	1	0.927	0.073	0.000	0.000
4	2529	37	1	0.647	0.349	0.004	0.000
5	2524	36	1	0.701	0.297	0.002	0.000
6	2606	31	2	0.021	0.740	0.239	0.000
7	2474	42	1	0.950	0.050	0.000	0.000
8	2657	26	3	0.000	0.132	0.858	0.009
9	2600	31	2	0.033	0.784	0.183	0.000
10	2672	23	3	0.000	0.028	0.949	0.023

Table 2.12 presents a hypothetical set of results for the overall score and for a claim score (claim 3) for a population of students. The number (N) and proportion (P) of students classified into each achievement level is shown in the first three columns. These are counts and proportions of “observed” classifications in the population. Students are classified into the four achievement levels by their overall score. By claim scores, students are classified as “below,” “near,” or “above” standard, where the standard is the level 3 cut score. Rules for classifying students by their claim scores are detailed in Chapter 7.

The next four columns (“Freq L1,” etc.) show the number of students by “true level” among students at a given “observed level.” The last four columns convert the frequencies by true level into proportions. The sum of proportions in the last four columns of the “Overall” section of the table equals 1.0. Likewise, the sum of proportions in the last four columns of the “Claim 3” section of the table equals 1.0. For the overall test, the proportions of correct classifications for this hypothetical example are .404, .180, .145, and .098 for levels 1-4, respectively.

Table 2.12: EXAMPLE OF CROSS-CLASSIFYING TRUE ACHIEVEMENT LEVEL BY OBSERVED ACHIEVEMENT LEVEL
Score	Observed Level	N	P	Freq L1	Freq L2	Freq L3	Freq L4	Prop L1	Prop L2	Prop L3	Prop L4
Overall	Level 1	251,896	0.451	225,454	26,172	263	8	0.404	0.047	0.000	0.000
	Level 2	141,256	0.253	21,800	100,364	19,080	11	0.039	0.180	0.034	0.000
	Level 3	104,125	0.186	161	14,223	81,089	8,652	0.000	0.025	0.145	0.015
	Level 4	61,276	0.110	47	29	6,452	54,748	0.000	0.000	0.012	0.098
Claim 3	Below Standard	167,810	0.300	143,536	18,323	4,961	990	0.257	0.033	0.009	0.002
	Near Standard	309,550	0.554	93,364	102,133	89,696	24,357	0.167	0.183	0.161	0.044
	Above Standard	81,193	0.145	94	1,214	18,949	60,936	0.000	0.002	0.034	0.109

For claim scores, correct “below” classifications are represented in cells corresponding to the “below standard” row and the levels 1 and 2 columns. Both levels 1 and 2 are below the level 3 cut score, which is the standard. Similarly, correct “above” standard classifications are represented in cells corresponding to the “above standard” row and the levels 3 and 4 columns. Correct classifications for “near” standard are not computed. There is no absolute criterion or scale score range, such as is defined by cut scores, for determining whether a student is truly at or near the standard. That is, the standard (level 3 cut score) clearly defines whether a student is above or below standard, but there is no range centered on the standard for determining whether a student is “near.”

Table 2.13 shows more specifically how the proportion of correct classifications is computed for classifications based on students’ overall and claim scores. For each type of score (overall and claim), the proportion of correct classifications is computed overall and conditionally on each observed classification (except for the “near standard” claim score classification). The conditional proportion correct is the proportion correct within a row divided by the total proportion within a row. For the overall score, the overall proportion correct is the sum of the proportions correct within the overall table section.

Table 2.13: EXAMPLE OF CORRECT CLASSIFICATION RATES
Score	Observed Level	P	Prop L1	Prop L2	Prop L3	Prop L4	Accuracy by level	Accuracy overall
Overall	Level 1	0.451	0.404	0.047	0.000	0.000	.404/.451=.895	(.404+.180+.145+.098)/1.000=.827
	Level 2	0.253	0.039	0.180	0.034	0.000	.180/.253=.711
	Level 3	0.186	0.000	0.025	0.145	0.015	.145/.186=.779
	Level 4	0.110	0.000	0.000	0.012	0.098	.098/.110=.893
Claim 3	Below Standard	0.300	0.257	0.033	0.009	0.002	(.257+.033)/.300=.965	(.257+.033+.034+.109)/(.300+.145)=.971
	Near Standard	0.554	0.167	0.183	0.161	0.044	NA
	Above Standard	0.145	0.000	0.002	0.034	0.109	(.034+.109)/.145=.984

For the claim score, the overall classification accuracy rate is based only on students whose observed achievement is “below standard” or “above standard.” That is, the overall proportion correct for classifications by claim scores is the sum of the proportions correct in the claim section of the table, divided by the sum of all of the proportions in the “above standard” and “below standard” rows.

The following two sections show classification accuracy statistics for ELA/literacy and mathematics. There are seven tables in each section—one for each grade 3-8 and high school (HS). The statistics in these tables were computed as described above.

2.4.1 English Language Arts/Literacy

Results in this section are based on real data from students who took the full blueprint. Table 2.14 through Table 2.20 show ELA/literacy classification accuracy for each grade 3-8 and high school (HS). Section 2.4 explains how the statistics in these tables were computed. Classification accuracy for each category was high to moderately high for all ELA/literacy grades.

Table 2.14: GRADE 3 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	57,710	0.341	0.312	0.03	0	0	0.913	0.818
	Level 2	40,787	0.241	0.033	0.177	0.032	0	0.734
	Level 3	35,172	0.208	0	0.033	0.146	0.029	0.702
	Level 4	35,328	0.209	0	0	0.026	0.183	0.874
Claim 1	Below	32,268	0.350	0.28	0.065	0.005	0	0.986	0.983
	Near	38,106	0.413	0.049	0.168	0.15	0.046
	Above	21,845	0.237	0	0.005	0.051	0.181	0.98
Claim 2	Below	17,858	0.395	0.285	0.088	0.019	0.003	0.946	0.944
	Near	17,997	0.398	0.051	0.145	0.138	0.064
	Above	9,402	0.208	0.001	0.011	0.052	0.144	0.943
Claim 3	Below	12,744	0.138	0.12	0.016	0.002	0	0.981	0.942
	Near	60,506	0.656	0.207	0.204	0.153	0.092
	Above	18,970	0.206	0.002	0.017	0.051	0.135	0.903
Claim 4	Below	15,459	0.342	0.274	0.061	0.006	0	0.98	0.973
	Near	20,264	0.448	0.063	0.177	0.158	0.051
	Above	9,534	0.211	0	0.007	0.045	0.159	0.967
Total:	All Students	168,997	1.000

Table 2.15: GRADE 4 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	66,183	0.357	0.327	0.03	0	0	0.916	0.81
	Level 2	35,707	0.193	0.032	0.129	0.031	0	0.671
	Level 3	40,381	0.218	0	0.034	0.151	0.033	0.693
	Level 4	43,179	0.233	0	0	0.03	0.203	0.872
Claim 1	Below	30,783	0.332	0.287	0.04	0.004	0	0.986	0.983
	Near	39,955	0.431	0.068	0.146	0.158	0.059
	Above	22,008	0.237	0	0.005	0.046	0.186	0.98
Claim 2	Below	19,172	0.419	0.315	0.073	0.026	0.004	0.929	0.937
	Near	18,241	0.398	0.059	0.113	0.142	0.084
	Above	8,386	0.183	0.001	0.009	0.041	0.132	0.945
Claim 3	Below	18,309	0.197	0.175	0.018	0.004	0	0.978	0.951
	Near	53,895	0.581	0.177	0.159	0.154	0.091
	Above	20,542	0.221	0.003	0.013	0.051	0.154	0.925
Claim 4	Below	14,979	0.327	0.27	0.047	0.01	0.001	0.967	0.967
	Near	21,532	0.470	0.105	0.143	0.156	0.067
	Above	9,288	0.203	0.001	0.006	0.044	0.152	0.967
Total:	All Students	185,450	1.000

Table 2.16: GRADE 5 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	64,133	0.343	0.315	0.028	0	0	0.918	0.822
	Level 2	37,523	0.201	0.031	0.141	0.029	0	0.702
	Level 3	49,566	0.265	0	0.032	0.203	0.03	0.765
	Level 4	35,896	0.192	0	0	0.028	0.164	0.852
Claim 1	Below	30,383	0.325	0.276	0.045	0.005	0	0.986	0.983
	Near	38,834	0.416	0.057	0.146	0.176	0.037
	Above	24,163	0.259	0	0.005	0.077	0.176	0.979
Claim 2	Below	17,937	0.390	0.291	0.074	0.024	0.002	0.934	0.941
	Near	18,641	0.405	0.06	0.12	0.17	0.055
	Above	9,411	0.205	0.001	0.01	0.068	0.126	0.947
Claim 3	Below	18,239	0.195	0.172	0.019	0.004	0	0.98	0.951
	Near	54,006	0.578	0.158	0.162	0.183	0.075
	Above	21,136	0.226	0.003	0.015	0.071	0.138	0.922
Claim 4	Below	7,553	0.164	0.147	0.015	0.002	0	0.985	0.981
	Near	28,826	0.627	0.204	0.184	0.196	0.043
	Above	9,610	0.209	0	0.005	0.065	0.139	0.976
Total:	All Students	187,118	1.000

Table 2.17: GRADE 6 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	63,206	0.337	0.306	0.031	0	0	0.907	0.819
	Level 2	48,279	0.257	0.034	0.191	0.033	0	0.741
	Level 3	51,663	0.275	0	0.034	0.214	0.027	0.778
	Level 4	24,424	0.130	0	0	0.022	0.109	0.834
Claim 1	Below	30,778	0.330	0.26	0.065	0.005	0	0.985	0.981
	Near	42,702	0.458	0.052	0.183	0.193	0.03
	Above	19,775	0.212	0	0.005	0.077	0.13	0.977
Claim 2	Below	20,163	0.441	0.303	0.11	0.027	0.001	0.937	0.936
	Near	17,106	0.374	0.038	0.137	0.164	0.035
	Above	8,433	0.185	0.001	0.011	0.071	0.102	0.934
Claim 3	Below	21,625	0.232	0.193	0.034	0.005	0	0.979	0.948
	Near	54,631	0.586	0.118	0.205	0.2	0.063
	Above	17,001	0.182	0.002	0.013	0.07	0.097	0.918
Claim 4	Below	13,809	0.302	0.245	0.051	0.005	0	0.982	0.957
	Near	21,613	0.473	0.096	0.193	0.16	0.023
	Above	10,280	0.225	0.001	0.014	0.096	0.114	0.932
Total:	All Students	187,572	1.000

Table 2.18: GRADE 7 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	61,119	0.327	0.295	0.032	0	0	0.902	0.812
	Level 2	48,443	0.259	0.036	0.187	0.036	0	0.721
	Level 3	55,702	0.298	0	0.037	0.234	0.026	0.786
	Level 4	21,878	0.117	0	0	0.02	0.097	0.833
Claim 1	Below	27,481	0.294	0.239	0.051	0.004	0	0.986	0.981
	Near	44,081	0.472	0.059	0.187	0.204	0.022
	Above	21,911	0.234	0	0.006	0.099	0.129	0.976
Claim 2	Below	14,325	0.312	0.235	0.066	0.011	0	0.964	0.949
	Near	20,950	0.457	0.068	0.176	0.187	0.025
	Above	10,606	0.231	0.001	0.014	0.107	0.11	0.934
Claim 3	Below	22,664	0.242	0.195	0.04	0.007	0	0.969	0.96
	Near	55,729	0.596	0.103	0.196	0.24	0.057
	Above	15,090	0.161	0.001	0.007	0.06	0.094	0.952
Claim 4	Below	11,546	0.252	0.198	0.047	0.007	0	0.973	0.967
	Near	24,670	0.538	0.106	0.201	0.207	0.023
	Above	9,665	0.211	0.001	0.008	0.091	0.111	0.96
Total:	All Students	187,142	1.000

Table 2.19: GRADE 8 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	27,932	0.291	0.259	0.032	0	0	0.889	0.8
	Level 2	25,059	0.261	0.037	0.187	0.038	0	0.716
	Level 3	30,068	0.314	0	0.039	0.244	0.03	0.777
	Level 4	12,823	0.134	0	0	0.024	0.11	0.822
Claim 1	Below	30,118	0.315	0.249	0.063	0.004	0	0.988	0.982
	Near	43,899	0.459	0.046	0.191	0.205	0.018
	Above	21,558	0.226	0	0.005	0.098	0.122	0.976
Claim 2	Below	18,153	0.390	0.262	0.105	0.023	0	0.94	0.941
	Near	21,140	0.454	0.048	0.162	0.209	0.035
	Above	7,287	0.156	0.001	0.009	0.068	0.08	0.941
Claim 3	Below	22,065	0.231	0.181	0.043	0.007	0	0.971	0.953
	Near	54,723	0.573	0.112	0.205	0.216	0.041
	Above	18,792	0.197	0.001	0.012	0.084	0.1	0.935
Claim 4	Below	10,724	0.230	0.187	0.04	0.003	0	0.985	0.97
	Near	26,738	0.574	0.123	0.227	0.201	0.022
	Above	9,118	0.196	0	0.008	0.094	0.092	0.954
Total:	All Students	95,882	1.000

Table 2.20: HIGH SCHOOL ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	1,400	0.148	0.129	0.019	0	0	0.873	0.792
	Level 2	1,950	0.206	0.024	0.15	0.032	0	0.725
	Level 3	3,407	0.361	0	0.045	0.271	0.045	0.75
	Level 4	2,688	0.285	0	0	0.043	0.242	0.85
Claim 1	Below	1,691	0.179	0.119	0.053	0.007	0	0.961	0.972
	Near	4,510	0.478	0.034	0.155	0.237	0.051
	Above	3,241	0.343	0	0.006	0.101	0.236	0.984
Claim 2	Below	1,528	0.162	0.112	0.042	0.008	0	0.95	0.961
	Near	4,759	0.504	0.042	0.163	0.23	0.069
	Above	3,155	0.334	0	0.009	0.107	0.218	0.973
Claim 3	Below	1,243	0.132	0.09	0.034	0.008	0	0.939	0.955
	Near	6,017	0.637	0.064	0.173	0.276	0.124
	Above	2,182	0.231	0	0.006	0.061	0.163	0.972
Claim 4	Below	1,373	0.145	0.107	0.035	0.004	0	0.975	0.976
	Near	4,861	0.515	0.047	0.17	0.236	0.061
	Above	3,208	0.340	0	0.008	0.106	0.226	0.977
Total:	All Students	9,445	1.000

2.4.2 Mathematics

Results in this section are based on real data from students who took the full blueprint. Table 2.21 through Table 2.27 show the classification accuracy of the mathematics assessment for each grade 3-8 and high school (HS). Section 2.4 explains how the statistics in these tables were computed. Classification accuracy for each category was high to moderately high for all mathematics grades.

Table 2.21: GRADE 3 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	56,619	0.335	0.31	0.025	0	0	0.925	0.852
	Level 2	38,293	0.227	0.027	0.173	0.027	0	0.761
	Level 3	42,724	0.253	0	0.028	0.203	0.021	0.804
	Level 4	31,278	0.185	0	0	0.019	0.166	0.896
Claim 1	Below	35,432	0.385	0.284	0.08	0.018	0.002	0.947	0.934
	Near	26,767	0.291	0.037	0.12	0.12	0.014
	Above	29,935	0.325	0.006	0.019	0.113	0.187	0.922
Claim 2/4	Below	16,177	0.281	0.226	0.051	0.005	0	0.984	0.981
	Near	25,452	0.442	0.058	0.169	0.183	0.033
	Above	15,915	0.277	0	0.006	0.079	0.191	0.978
Claim 3	Below	21,724	0.236	0.192	0.039	0.004	0	0.981	0.96
	Near	46,324	0.503	0.132	0.166	0.168	0.036
	Above	24,086	0.261	0.003	0.013	0.078	0.167	0.938
Total:	All Students	168,914	1.000

Table 2.22: GRADE 4 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	55,133	0.298	0.273	0.025	0	0	0.916	0.858
	Level 2	53,688	0.290	0.026	0.238	0.026	0	0.821
	Level 3	44,717	0.241	0	0.027	0.194	0.02	0.805
	Level 4	31,672	0.171	0	0	0.018	0.153	0.896
Claim 1	Below	39,109	0.422	0.275	0.123	0.021	0.004	0.94	0.932
	Near	24,514	0.265	0.017	0.126	0.108	0.014
	Above	28,985	0.313	0.005	0.019	0.11	0.179	0.923
Claim 2/4	Below	18,518	0.322	0.223	0.092	0.006	0	0.98	0.983
	Near	24,811	0.431	0.027	0.181	0.185	0.038
	Above	14,197	0.247	0	0.003	0.064	0.179	0.986
Claim 3	Below	19,742	0.213	0.152	0.058	0.003	0	0.983	0.963
	Near	50,042	0.540	0.143	0.198	0.164	0.036
	Above	22,824	0.246	0.002	0.012	0.073	0.16	0.943
Total:	All Students	185,210	1.000

Table 2.23: GRADE 5 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	73,654	0.394	0.363	0.031	0	0	0.922	0.855
	Level 2	48,213	0.258	0.028	0.205	0.025	0	0.794
	Level 3	31,765	0.170	0	0.024	0.127	0.019	0.748
	Level 4	33,238	0.178	0	0	0.018	0.16	0.897
Claim 1	Below	44,957	0.482	0.339	0.116	0.021	0.007	0.942	0.936
	Near	24,404	0.262	0.023	0.124	0.09	0.025
	Above	23,880	0.256	0.003	0.015	0.065	0.173	0.929
Claim 2/4	Below	20,870	0.361	0.283	0.072	0.005	0	0.985	0.982
	Near	25,169	0.435	0.058	0.184	0.139	0.054
	Above	11,848	0.205	0	0.004	0.037	0.164	0.979
Claim 3	Below	30,150	0.323	0.255	0.061	0.006	0.001	0.978	0.965
	Near	46,325	0.497	0.109	0.185	0.136	0.066
	Above	16,766	0.180	0.001	0.008	0.034	0.137	0.952
Total:	All Students	186,870	1.000

Table 2.24: GRADE 6 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	73,086	0.391	0.361	0.029	0	0	0.925	0.848
	Level 2	52,637	0.281	0.031	0.222	0.028	0	0.79
	Level 3	33,688	0.180	0	0.027	0.133	0.02	0.74
	Level 4	27,640	0.148	0	0	0.016	0.131	0.889
Claim 1	Below	45,320	0.488	0.344	0.121	0.019	0.004	0.953	0.942
	Near	27,451	0.295	0.022	0.138	0.107	0.028
	Above	20,156	0.217	0.002	0.013	0.056	0.146	0.932
Claim 2/4	Below	21,390	0.369	0.292	0.072	0.005	0	0.986	0.981
	Near	25,697	0.443	0.05	0.196	0.148	0.049
	Above	10,925	0.188	0	0.004	0.037	0.146	0.977
Claim 3	Below	25,002	0.269	0.213	0.051	0.005	0.001	0.979	0.966
	Near	52,905	0.569	0.155	0.214	0.142	0.058
	Above	15,020	0.162	0	0.007	0.035	0.119	0.954
Total:	All Students	187,051	1.000

Table 2.25: GRADE 7 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	72,826	0.391	0.357	0.033	0	0	0.915	0.851
	Level 2	48,942	0.263	0.031	0.205	0.026	0	0.783
	Level 3	36,668	0.197	0	0.024	0.154	0.019	0.783
	Level 4	27,967	0.150	0	0	0.016	0.134	0.896
Claim 1	Below	47,449	0.511	0.33	0.143	0.032	0.006	0.925	0.937
	Near	23,518	0.253	0.013	0.117	0.105	0.019
	Above	21,979	0.236	0.001	0.011	0.069	0.155	0.948
Claim 2/4	Below	19,918	0.343	0.279	0.06	0.004	0	0.987	0.983
	Near	25,850	0.445	0.073	0.191	0.149	0.032
	Above	12,282	0.212	0	0.004	0.049	0.159	0.979
Claim 3	Below	21,529	0.232	0.185	0.042	0.004	0	0.98	0.97
	Near	55,239	0.594	0.158	0.222	0.16	0.054
	Above	16,178	0.174	0	0.007	0.042	0.125	0.96
Total:	All Students	186,403	1.000

Table 2.26: GRADE 8 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	37,728	0.397	0.356	0.04	0	0	0.898	0.819
	Level 2	24,304	0.256	0.036	0.186	0.033	0	0.727
	Level 3	17,332	0.182	0	0.029	0.13	0.023	0.714
	Level 4	15,730	0.165	0	0	0.018	0.147	0.889
Claim 1	Below	50,617	0.533	0.368	0.125	0.032	0.008	0.925	0.941
	Near	25,586	0.270	0.023	0.122	0.099	0.025
	Above	18,732	0.197	0.001	0.008	0.051	0.138	0.957
Claim 2/4	Below	20,250	0.341	0.287	0.05	0.004	0	0.989	0.98
	Near	26,737	0.450	0.111	0.184	0.124	0.032
	Above	12,422	0.209	0	0.006	0.046	0.157	0.97
Claim 3	Below	24,636	0.260	0.208	0.044	0.006	0.001	0.972	0.959
	Near	55,821	0.588	0.183	0.203	0.143	0.058
	Above	14,478	0.153	0.001	0.008	0.033	0.111	0.945
Total:	All Students	95,094	1.000

Table 2.27: HIGH SCHOOL MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	3,307	0.350	0.316	0.033	0	0	0.905	0.831
	Level 2	2,660	0.281	0.037	0.209	0.035	0	0.745
	Level 3	2,286	0.242	0	0.029	0.193	0.019	0.8
	Level 4	1,206	0.127	0	0	0.016	0.112	0.876
Claim 1	Below	4,391	0.464	0.341	0.117	0.006	0	0.986	0.987
	Near	2,976	0.315	0.012	0.152	0.145	0.006
	Above	2,089	0.221	0	0.003	0.093	0.125	0.988
Claim 2/4	Below	2,591	0.274	0.232	0.04	0.002	0	0.992	0.963
	Near	4,625	0.489	0.12	0.217	0.141	0.011
	Above	2,240	0.237	0.001	0.015	0.101	0.12	0.934
Claim 3	Below	2,988	0.316	0.249	0.061	0.005	0	0.984	0.977
	Near	4,854	0.513	0.104	0.205	0.179	0.026
	Above	1,614	0.171	0	0.005	0.06	0.105	0.97
Total:	All Students	9,459	1.000

2.5 Standard Errors of Measurement (SEMs)

The standard error of measurement (SEM) information in this section is based on student scores and associated SEMs included in the data Smarter Balanced received from members after the 2021-22 administration. Student scores and SEMs are not computed directly by Smarter Balanced. They are computed by service providers who deliver the test according to the scoring specifications provided by Smarter Balanced. These include the use of Equation (2.7) in this chapter for computing SEMs. According to this equation, and the adaptive nature of the test, different students receive different items. The amount of measurement error will therefore vary from student to student, even among students with the same estimate of achievement.

All of the SEM statistics reported in this chapter are based on the full blueprint and are in the reporting scale metric. For member data that includes SEMs in the theta metric exclusively, the SEMs are transformed to the reporting metric using the multiplication factors in the theta-to-scale-score transformation given in Chapter 5. Please remember that ELA/literacy and mathematics are not in the same metric and keep in mind that due to different schools’ responses to post-pandemic test administration in 2022-23, data used for SEMs analyses should not be considered as representative of the whole population.

Table 2.28 and Table 2.29 show the trend in the SEM by student decile for ELA/literacy and mathematics, respectively. Deciles were defined by ranking students from highest to lowest scale score and dividing the students into 10 equal-sized groups according to rank. Decile 1 contains the 10% of students with the lowest scale scores. Decile 10 contains the 10% of students with the highest scale scores. The SEM reported for a decile is the average SEM among examinees at that decile.

Table 2.28: MEAN OVERALL SEM AND CONDITIONAL SEMS BY DECILE, ELA/LITERACY
Grade	Mean	d1	d2	d3	d4	d5	d6	d7	d8	d9	d10
3	23.0	28.9	22.9	22.2	21.9	21.9	21.8	21.8	22.0	22.4	24.2
4	24.8	30.1	25.0	23.8	23.7	23.7	23.5	23.4	23.4	23.7	26.8
5	24.3	28.9	23.3	22.3	22.4	22.7	22.9	23.3	24.4	25.2	27.4
6	25.6	31.5	25.9	24.0	23.7	24.0	24.3	24.5	24.4	25.2	27.9
7	27.8	36.2	28.8	26.8	25.9	25.6	25.8	26.3	26.7	27.1	28.4
8	30.4	40.0	30.6	28.9	28.5	28.6	28.6	28.4	28.6	29.3	31.5
HS	32.9	40.1	32.5	31.3	31.0	30.9	31.0	31.2	31.8	32.5	34.5

Table 2.29: MEAN OVERALL SEM AND CONDITIONAL SEMS BY DECILE, MATHEMATICS
Grade	Mean	d1	d2	d3	d4	d5	d6	d7	d8	d9	d10
3	18.0	26.0	19.6	18.0	17.0	16.6	16.3	16.1	16.0	16.1	17.8
4	18.1	27.8	20.5	18.4	17.0	16.5	16.3	16.0	15.5	15.4	16.6
5	21.4	34.8	27.4	24.5	22.1	20.1	18.4	17.2	16.1	16.1	17.2
6	23.9	40.9	29.4	25.6	23.8	21.9	20.8	20.2	19.3	18.2	18.6
7	25.5	44.7	32.9	28.6	26.3	24.1	22.3	20.9	18.8	18.0	18.1
8	30.3	46.4	37.6	34.3	32.0	29.7	27.8	26.0	24.2	22.5	22.6
HS	29.0	45.6	36.1	32.2	29.7	28.1	26.5	24.8	22.9	21.7	21.6

Table 2.30 and Table 2.31 show the average SEM near the achievement level cut scores. In the table, M is Mean and SD is Standard Deviation.

The average SEM reported for a given cut score is the average SEM among students within 10 scale score units of the cut score. In the column headings, “Cut1” is the lowest cut score defining the lower boundary of level 2, “Cut2” defines the lower boundary of level 3, and “Cut3” defines the lower boundary of level 4.

Table 2.30: CONDITIONAL SEM NEAR (±10 POINTS) ACHIEVEMENT LEVEL CUT SCORES, ELA/LITERACY
Grade	Cut1v2_N	Cut1v2_M	Cut1v2_SD	Cut2v3_N	Cut2v3_M	Cut2v3_SD	Cut3v4_N	Cut3v4_M	Cut3v4_SD
3	12617	21.9	2.62	13253	21.8	2.62	11211	22.1	2.97
4	12942	23.7	3.31	13951	23.5	3.45	13359	23.4	3.33
5	13003	22.4	3.13	12944	22.9	3.31	12101	24.8	3.03
6	13507	23.6	2.96	13587	24.5	3.54	9738	25.5	3.83
7	12772	26.0	2.92	14175	26.0	3.07	8115	27.2	2.79
8	6062	28.6	3.23	6873	28.6	2.52	4915	29.5	2.27
HS	308	32.3	1.93	625	31.0	1.59	714	31.6	1.92

Table 2.31: CONDITIONAL SEM NEAR (±10 POINTS) OF ACHIEVEMENT LEVEL CUT SCORES, MATHEMATICS
Grade	Cut1v2_N	Cut1v2_M	Cut1v2_SD	Cut2v3_N	Cut2v3_M	Cut2v3_SD	Cut3v4_N	Cut3v4_M	Cut3v4_SD
3	13725	17.1	2.01	15012	16.2	1.56	11395	16.0	1.40
4	14131	17.5	2.16	16537	16.2	1.26	11951	15.3	1.72
5	13465	21.3	2.64	14162	17.1	2.46	11508	15.9	2.23
6	13042	23.0	2.64	13606	20.1	1.86	9831	18.1	2.60
7	12525	25.6	4.19	11753	20.8	2.97	9331	17.9	2.52
8	6194	30.8	5.07	6101	25.9	3.40	4501	22.6	2.94
HS	608	29.6	1.67	626	25.1	1.77	402	21.4	1.73

Figure 2.2 to Figure 2.15 are scatter plots of a random sample of 2,000 individual student SEMs as a function of scale score for the total test and claims/subscores by grade within subject. These plots show the variability of SEMs among students with the same scale score as well as the trend in SEM with student achievement (scale score). In comparison to the total score, a claim score has greater measurement error and variability among students due to the fact that the claim score is based on a smaller number of items. Among claims, those representing fewer items will have higher measurement error and greater variability of measurement error than those representing more items.

Dashed vertical lines in Figure 2.2 to Figure 2.15 represent the achievement level cut scores. The plots for the high school standard errors show cut scores for each grade 9, 10, and 11, separately.

Figure 2.2: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 3

Figure 2.3: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 4

Figure 2.4: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 5

Figure 2.5: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 6

Figure 2.6: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 7

Figure 2.7: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 8

Figure 2.8: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy High School

Figure 2.9: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 3

Figure 2.10: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 4

Figure 2.11: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 5

Figure 2.12: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 6

Figure 2.13: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 7

Figure 2.14: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 8

Figure 2.15: Students’ Standard Error of Measurement by Scale Score, Mathematics High School

All of the tables and figures in this section, for every grade and subject, show a trend of higher measurement error for lower-achieving students. This trend reflects the fact that the item pool is difficult in comparison to overall student achievement. The computer adaptive test (CAT) algorithm still delivers easier items to lower-achieving students than they would typically receive in a non-adaptive test, or in a fixed form where difficulty is similar to that of the item pool as a whole. But low-achieving students still tend to receive items that are relatively more difficult for them. Typically, this is because the CAT algorithm does not have easier items available within the blueprint constraints that must be met for all students.

2022-23 Summative Technical Report