Chapter 2 Reliability, Precision, and Errors of Measurement

2.1 Introduction

This chapter addresses the technical quality of operational test functioning with regard to precision and reliability. Part of the test validity argument is that scores must be consistent and precise enough to be useful for intended purposes. If scores are to be meaningful, tests should deliver the same results under repeated administrations to the same student or for students of the same ability. In addition, the range of certainty around the score should be small enough to support educational decisions. The reliability and precision of a test are examined through analysis of measurement error and other test properties in simulated and operational conditions. For example, the reliability of a test may be assessed in part by verifying that different test forms follow the same blueprint. In computer adaptive testing (CAT), one cannot expect the same set of items to be administered to the same examinee more than once. Consequently, reliability is inferred from internal test properties, including test length and the information provided by item parameters. Measurement precision is enhanced when the student receives items that are well matched, in terms of difficulty, to the overall performance level of the student. Measurement precision is also enhanced when the items a student receives work well together to measure the same general body of knowledge, skills, and abilities defined by the test blueprint. Smarter Balanced uses an adaptive model because adaptive tests are customized to each student in terms of the difficulty of the items. Smarter Balanced used item quality control procedures that ensure test items measure the knowledge, skills, and abilities specified in the test blueprint and work well together in this respect. The expected outcome of these and other test administration and item quality control procedures is high reliability and low measurement error.

For the 2020-21 administration, all statistics in this chapter are based on the full blueprint. Measurement bias from the simulation results produced by Cambium Assessment are provided, along with reliability, classification accuracy, and standard errors of measurement based on student data provided by Montana, Nevada, South Dakota, and Vermont. Statistics about the paper/pencil forms are based on the items on the forms, not the students who took the assessment in 2020-21.

2.2 Measurement Bias

Measurement bias is any systematic or non-random error that occurs in estimating a student’s achievement from the student’s scores on test items. Prior to the release of the 2020-21 item pool, simulation studies were carried out to ensure that the item pool, combined with the adaptive test administration algorithm, would produce satisfactory tests with regard to measurement bias and random measurement error as a function of student achievement, overall reliability, fulfillment of test blueprints, and item exposure.

Results for measurement bias with the full blueprint are provided in this section. Measurement bias is the one index of test performance that is clearly and preferentially assessed through simulation as opposed to the use of real data. With real data, true student achievement is unknown. In simulation, true student achievement can be assumed and used to generate item responses. The simulated item responses are used in turn to estimate achievement. Achievement estimates are then compared to the underlying assumed, true values of student achievement to assess whether the estimates contain systematic error (bias).

Simulations for the 2020-21 administration were carried out by Cambium Assessment. The simulations were performed for each grade within a subject area for the standard item pool (English) and for accommodation item pools of braille and Spanish for mathematics and braille for ELA/literacy. For the standard item pools, the number of simulees was 3,000 for grades 3-8 and 5,000 for grade 11. For the braille and Spanish pools, the number of simulees was 1,000 for grades 3-8 and 2,000 for grade 11. True student achievement values were sampled from a normal distribution for each grade and subject. The parameters for the normal distribution were based on students’ operational scores on the 2018–2019 Smarter Balanced summative tests.

Test events were created for the simulated examinees using the 2020-21 item pool. Estimated ability ( \(\hat{\theta}\) ) was calculated from the simulated tests using maximum likelihood estimation (MLE) as described in the Smarter Balanced Test Scoring Specifications (Smarter Balanced, 2022d).

Bias was computed as:

\[\begin{equation} bias = N^{-1}\sum_{i = 1}^{N} (\theta_{i} - \hat{\theta}_{i}) \tag{2.1} \end{equation}\]

and the error variance of the estimated bias is:

\[\begin{equation} ErrorVar(bias) = \frac{1}{N(N-1)}\sum_{i = 1}^{N} (\theta_{i} - \hat{\theta}_{i}-mean(\theta_{i}-\hat{\theta}_{i}))^{2} \tag{2.2} \end{equation}\]

where \(\theta_{i} - \hat{\theta}\) is the deviation score, and \(N\) denotes the number of simulees (\(N = 1000\) for all conditions). Statistical significance of the bias is tested using a z-test: \[\begin{equation} z = \frac{bias}{\sqrt{ErrorVar(bias)}} \tag{2.3} \end{equation}\]

Table 2.1 and Table 2.2 show for ELA/literacy and mathematics, respectively, the bias in estimates of student achievement based on the complete test assembled from the standard item pool and the accommodations pools included in the simulations. The standard error of bias is the denominator of the z-score in Equation (2.3). The p-value is the probability \(|Z| > |z|\) where \(Z\) is a standard normal variate and \(|z|\) is the absolute value of the \(z\) computed in Equation (2.3). Under the hypothesis of no bias, approximately 5% and 1% of the \(\theta_{i}\) will fall outside, respectively, 95% and 99% confidence intervals centered on \(\theta_{i}\).

Mean bias was generally very small in practical terms, exceeding .02 in absolute value in no cases for ELA/literacy and in only six cases for mathematics. Mean bias tended to be statistically significantly different from 0, but this was due to the large sample sizes used for the simulation. In virtually all cases, the percentage of simulated examinees whose estimated achievement score fell outside the confidence intervals centered on their true score was close to expected values of 5% for the 95% confidence interval and 1% for the 99% confidence interval. Plots of bias by estimated theta in the full simulation report show that positive and statistically significant mean bias was due to thetas being underestimated in regions of student achievement far below the lowest cut score (separating achievement levels 1 and 2). The same plots show that estimation bias is negligible near all cut scores in all cases.

Table 2.1: BIAS OF THE ESTIMATED PROFICIENCIES: ENGLISH LANGUAGE ARTS/LITERACY
Pool	Grade	Mean Bias	SE (Bias)	P value	MSE	95% CI Miss Rate	99% CI Miss Rate
Standard	3	0.00	0.01	0.54	0.11	5.10%	0.90%
	4	-0.01	0.01	0.30	0.13	5.60%	1.40%
	5	0.00	0.01	0.58	0.13	5.90%	1.00%
	6	-0.01	0.01	0.40	0.13	5.20%	0.80%
	7	0.00	0.01	0.80	0.15	4.20%	0.80%
	8	-0.01	0.01	0.05	0.15	4.50%	0.80%
	11	0.01	0.01	0.40	0.18	4.70%	0.90%
Braille	3	-0.02	0.01	0.03	0.12	6.00%	1.10%
	4	0.01	0.01	0.52	0.13	4.50%	1.20%
	5	-0.01	0.01	0.24	0.14	5.80%	1.20%
	6	-0.02	0.01	0.17	0.14	5.00%	1.00%
	7	0.01	0.01	0.41	0.15	5.00%	0.70%
	8	0.00	0.01	0.93	0.16	5.30%	0.60%
	11	-0.01	0.01	0.39	0.20	4.20%	1.00%

Table 2.2: BIAS OF THE ESTIMATED PROFICIENCIES: MATHEMATICS
Pool	Grade	Mean Bias	SE (Bias)	P value	MSE	95% CI Miss Rate	99% CI Miss Rate
Standard	3	0.01	0.00	0.26	0.07	5.10%	1.20%
	4	0.01	0.00	0.23	0.07	4.90%	1.00%
	5	0.02	0.01	0.00	0.11	5.30%	1.00%
	6	0.02	0.01	0.00	0.14	4.80%	1.00%
	7	0.02	0.01	0.00	0.16	5.40%	1.20%
	8	0.02	0.01	0.01	0.18	5.20%	0.80%
	11	0.04	0.01	0.00	0.23	4.90%	1.10%
Braille	3	0.03	0.01	0.01	0.09	4.80%	0.60%
	4	0.00	0.01	0.78	0.08	4.60%	0.70%
	5	0.02	0.01	0.04	0.12	5.30%	1.00%
	6	0.00	0.01	0.78	0.16	4.00%	1.20%
	7	0.02	0.01	0.05	0.17	3.40%	0.10%
	8	0.03	0.02	0.05	0.23	6.10%	1.10%
	11	0.08	0.01	0.00	0.35	4.60%	0.70%
Spanish	3	0.01	0.01	0.16	0.08	4.60%	0.70%
	4	0.02	0.01	0.08	0.09	4.00%	0.90%
	5	0.02	0.01	0.07	0.15	3.60%	0.90%
	6	0.01	0.01	0.26	0.16	4.30%	0.90%
	7	0.03	0.01	0.02	0.18	4.00%	0.50%
	8	0.05	0.02	0.00	0.27	4.90%	1.00%
	11	0.07	0.01	0.00	0.32	4.00%	0.80%

2.3 Reliability

Reliability estimates reported in this section are derived from internal, IRT-based estimates of the measurement error in the test scores of examinees (MSE) and the observed variance of examinees’ test scores on the \(\theta\)-scale \((var(\hat{\theta}))\). The formula for the reliability estimate (\(\rho\)) is:

\[\begin{equation} \hat{\rho} = 1 - \frac{MSE}{var(\hat{\theta})}. \tag{2.4} \end{equation}\]

According to Smarter Balanced Test Scoring Specifications (Smarter Balanced, 2022c), estimates of measurement error are obtained from the parameter estimates of the items taken by the examinees. This is done by computing the test information for each examinee \(i\) as:

\[\begin{equation} \begin{split} I(\hat{\theta}_{i}) = \sum_{j=1}^{I}D^2a_{j}^2 (\frac{\sum_{l=1}^{m_{j}}l^2Exp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))} {1+\sum_{l=1}^{m_{j}}Exp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))} - \\ (\frac{\sum_{l=1}^{m_{j}}lExp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))} {1+\sum_{l=1}^{m_{j}}Exp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))})^2) \end{split} \tag{2.5} \end{equation}\]

where \(m_j\) is the maximum possible score point (starting from 0) for the \(j\)th item, and \(D\) is the scale factor, 1.7. Values of \(a_j\) and \(b_{jk}\) are item parameters for item \(j\) and score level \(k\). The test information is computed using only the items answered by the examinee. The measurement error (SEM) for examinee \(i\) is then computed as:

\[\begin{equation} SEM(\hat{\theta_i}) = \frac{1}{\sqrt{I(\hat{\theta_i})}}. \tag{2.6} \end{equation}\]

The upper bound of \(SEM(\hat{\theta_i})\) is set to 2.5. Any value larger than 2.5 is truncated at 2.5. The mean squared error for a group of \(N\) examinees is then:

\[\begin{equation} MSE = N^{-1}\sum_{i=1}^N SEM(\hat{\theta_i})^2 \tag{2.7} \end{equation}\]

and the variance of the achievement scores is: \[\begin{equation} var(\hat{\theta}) = N^{-1}\sum_{i=1}^N SEM(\hat{\theta_i} - \overline{\hat{\theta}})^2 \tag{2.8} \end{equation}\]

where \(\overline{\hat{\theta}}\) is the average of the \(\hat{\theta_i}\).

The measurement error for a group of examinees is typically reported as the square root of \(MSE\) and is denoted \(RMSE\). Measurement error is computed with Equation (2.6) and Equation (2.7) on a scale where achievement has a standard deviation close to 1 among students at a given grade. Measurement error reported in the tables of this section is transformed to the reporting scale by multiplying the theta-scale measurement error by \(a\), where \(a\) is the slope used to convert estimates of student achievement on the \(\theta\)-scale to the reporting scale. The transformation equations for converting estimates of student achievement on the \(\theta\)-scale to the reporting scale are given in Chapter 5.

2.3.1 General Population

Reliability estimates in this section are based on real data and the full blueprint. In mathematics, claims 2 and 4 are reported together as a single subscore, so there are only three reporting categories for mathematics, but four claims. Table 2.3 and Table 2.4 show the reliability of the observed total scores and subscores for ELA/literacy and mathematics. Reliability estimates are high for the total score in both subjects. Reliability coefficients are high for the claim 1 score in mathematics, moderately high for the claim 1 and claim 2 scores in ELA/literacy, and moderately high to moderate for the remainder of the claim scores in both subjects. The lowest reliability coefficient in either subject is .594, which is the reliability of the claim 3 score in the grade 7 mathematics assessment.

Table 2.3: ELA/LITERACY SUMMATIVE SCALE SCORE MARGINAL RELIABILITY ESTIMATES
Grade	N	Total score	Claim 1	Claim 2	Claim 3	Claim 4
3	52,928	0.916	0.774	0.695	0.576	0.691
4	53,380	0.915	0.782	0.695	0.599	0.690
5	53,732	0.924	0.782	0.736	0.625	0.760
6	49,180	0.907	0.744	0.722	0.579	0.676
7	49,613	0.907	0.765	0.731	0.594	0.658
8	48,753	0.908	0.751	0.718	0.596	0.700
HS	13,379	0.906	0.753	0.736	0.585	0.672

Table 2.4: MATHEMATICS SUMMATIVE SCALE SCORE MARGINAL RELIABILITY ESTIMATES
Grade	N	Total score	Claim 1	Claim 2/4	Claim 3
3	52,266	0.948	0.922	0.709	0.679
4	52,781	0.949	0.914	0.722	0.665
5	53,136	0.929	0.886	0.588	0.650
6	48,039	0.922	0.889	0.631	0.578
7	48,506	0.907	0.871	0.595	0.479
8	47,546	0.895	0.864	0.564	0.509
HS	13,541	0.910	0.858	0.623	0.597

2.3.2 Demographic Groups

Reliability estimates in this section are based on real data and the full blueprint. Whether students and schools tested during the 2020-21 administration year depended heavily on their response to the Covid-19 pandemic. Therefore, results presented here for demographic groups should not be considered representative of the entire student population. Table 2.5 and Table 2.6 show the reliability of the test for students of different racial groups in ELA/literacy and mathematics who tested in 2020-21. Table 2.7 and Table 2.8 show the reliability of the test for students who tested in 2020-21, grouped by demographics typically requiring accommodations or accessibility tools.

Because of the differences in average score across demographic groups and the relationship between measurement error and student achievement scores, which will be seen in the next section of this chapter, demographic groups with lower average scores tend to have lower reliability than the population as a whole. Nevertheless, the reliability coefficients for all demographic groups in these tables are moderately high to high.

Table 2.5: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY ETHNIC GROUP - ELA/LITERACY
Grade	Group	N	Var	MSE	Rho
3	All	52,928	7269.3	610.4	0.916
	American Indian or Alaska Native	1,192	6321.5	701.9	0.889
	Asian	1,773	7265.9	532.8	0.927
	Black or African American	3,393	6258.3	599.3	0.904
	Hispanic or Latino Ethnicity	12,831	5924.8	555.9	0.906
	White	20,768	7091.8	593.9	0.916
4	All	53,380	8220.3	696.0	0.915
	American Indian or Alaska Native	1,187	6530.0	764.7	0.883
	Asian	1,746	8085.1	605.3	0.925
	Black or African American	3,348	7080.3	666.0	0.906
	Hispanic or Latino Ethnicity	13,067	7216.1	621.5	0.914
	White	20,946	7644.0	666.2	0.913
5	All	53,732	8666.0	656.6	0.924
	American Indian or Alaska Native	1,207	7438.8	704.6	0.905
	Asian	1,861	8217.4	580.1	0.929
	Black or African American	3,352	7600.1	597.6	0.921
	Hispanic or Latino Ethnicity	13,134	7411.9	561.2	0.924
	White	21,316	8138.8	648.6	0.920
6	All	49,180	8044.5	747.5	0.907
	American Indian or Alaska Native	1,168	7356.5	786.9	0.893
	Asian	1,508	7564.1	626.2	0.917
	Black or African American	2,793	7319.5	742.2	0.899
	Hispanic or Latino Ethnicity	10,481	7030.0	680.7	0.903
	White	20,552	7443.6	700.0	0.906
7	All	49,613	8726.3	809.3	0.907
	American Indian or Alaska Native	1,163	8074.7	835.2	0.897
	Asian	1,482	7944.4	674.0	0.915
	Black or African American	2,563	8007.5	752.7	0.906
	Hispanic or Latino Ethnicity	10,611	7585.7	702.6	0.907
	White	20,853	7864.5	734.4	0.907
8	All	48,753	8985.0	828.2	0.908
	American Indian or Alaska Native	1,140	7682.8	876.9	0.886
	Asian	1,492	7889.4	725.7	0.908
	Black or African American	2,596	8204.6	828.1	0.899
	Hispanic or Latino Ethnicity	10,478	7877.6	768.8	0.902
	White	20,613	8465.5	780.2	0.908
HS	All	13,379	10876.5	1027.7	0.906
	American Indian or Alaska Native	629	9497.6	1047.4	0.890
	Asian	271	10871.5	1020.3	0.906
	Black or African American	428	10716.6	1102.9	0.897
	Hispanic or Latino Ethnicity	561	11688.5	1058.6	0.909
	White	11,326	10615.5	1018.0	0.904

Table 2.6: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY ETHNIC GROUP - MATHEMATICS
Grade	Group	N	Var	MSE	Rho
3	All	52,266	6621.8	345.2	0.948
	American Indian or Alaska Native	2,188	5847.0	471.9	0.919
	Asian	1,805	6561.3	281.0	0.957
	Black or African American	3,386	6033.0	404.1	0.933
	Hispanic or Latino Ethnicity	13,104	5739.7	342.4	0.940
	White	28,541	6013.9	334.0	0.944
4	All	52,781	7229.3	366.5	0.949
	American Indian or Alaska Native	2,222	5520.2	528.9	0.904
	Asian	1,753	7186.9	288.2	0.960
	Black or African American	3,329	6760.4	456.6	0.932
	Hispanic or Latino Ethnicity	13,370	6811.3	378.0	0.945
	White	28,982	6314.2	343.9	0.946
5	All	53,136	7962.6	568.6	0.929
	American Indian or Alaska Native	2,216	6711.0	854.2	0.873
	Asian	1,857	7871.5	400.1	0.949
	Black or African American	3,305	7339.7	760.9	0.896
	Hispanic or Latino Ethnicity	13,464	7189.3	623.1	0.913
	White	29,175	7345.3	507.7	0.931
6	All	48,039	8368.5	655.0	0.922
	American Indian or Alaska Native	2,139	8512.4	1062.9	0.875
	Asian	1,532	7906.3	473.9	0.940
	Black or African American	2,688	7574.4	848.9	0.888
	Hispanic or Latino Ethnicity	10,576	7147.8	694.2	0.903
	White	28,463	7810.2	601.2	0.923
7	All	48,506	9038.2	838.6	0.907
	American Indian or Alaska Native	2,163	8594.2	1344.9	0.844
	Asian	1,508	8688.9	559.7	0.936
	Black or African American	2,499	8412.6	1103.9	0.869
	Hispanic or Latino Ethnicity	10,644	7718.4	918.1	0.881
	White	28,986	8609.2	753.2	0.913
8	All	47,546	9968.1	1051.1	0.895
	American Indian or Alaska Native	2,089	8808.2	1434.9	0.837
	Asian	1,538	8466.7	767.8	0.909
	Black or African American	2,533	8690.9	1371.4	0.842
	Hispanic or Latino Ethnicity	10,513	8165.2	1185.8	0.855
	White	28,399	9893.8	956.0	0.903
HS	All	13,541	12465.0	1118.7	0.910
	American Indian or Alaska Native	638	9381.4	1357.6	0.855
	Asian	291	14236.1	1038.7	0.927
	Black or African American	422	10007.8	1536.3	0.846
	Hispanic or Latino Ethnicity	561	10803.8	1230.7	0.886
	White	11,489	12270.7	1097.5	0.911

Table 2.7: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY GROUP - ELA/LITERACY
Grade	Group	N	Var	MSE	Rho
3	All	52,928	7269.3	610.4	0.916
	LEP Status	5,470	4235.1	594.7	0.860
	Section 504 Status	469	7266.2	683.5	0.906
	Economic Disadvantage Status	21,088	6276.8	560.5	0.911
	IDEA Indicator	5,958	6379.7	692.5	0.891
4	All	53,380	8220.3	696.0	0.915
	LEP Status	5,767	4921.1	674.4	0.863
	Section 504 Status	537	7596.7	738.9	0.903
	Economic Disadvantage Status	20,994	7464.9	626.2	0.916
	IDEA Indicator	6,098	6971.9	767.4	0.890
5	All	53,732	8666.0	656.6	0.924
	LEP Status	4,221	4064.1	622.5	0.847
	Section 504 Status	655	8294.9	703.8	0.915
	Economic Disadvantage Status	21,137	7951.3	565.9	0.929
	IDEA Indicator	6,080	6767.7	693.8	0.897
6	All	49,180	8044.5	747.5	0.907
	LEP Status	2,604	4194.8	858.8	0.795
	Section 504 Status	1,684	7732.1	858.1	0.889
	Economic Disadvantage Status	16,175	7330.8	685.1	0.907
	IDEA Indicator	5,014	5947.3	842.3	0.858
7	All	49,613	8726.3	809.3	0.907
	LEP Status	2,692	4923.4	866.9	0.824
	Section 504 Status	1,785	8792.6	876.8	0.900
	Economic Disadvantage Status	19,183	8040.3	726.8	0.910
	IDEA Indicator	4,875	6391.9	894.0	0.860
8	All	48,753	8985.0	828.2	0.908
	LEP Status	2,653	4523.2	947.8	0.790
	Section 504 Status	1,883	8507.1	946.5	0.889
	Economic Disadvantage Status	18,721	8265.9	790.1	0.904
	IDEA Indicator	4,669	6194.5	964.2	0.844
HS	All	13,379	10876.5	1027.7	0.906
	LEP Status	191	6418.2	1309.3	0.796
	Section 504 Status	688	11219.4	1055.4	0.906
	Economic Disadvantage Status	3,217	10838.0	1058.1	0.902
	IDEA Indicator	1,569	8332.3	1263.3	0.848

Table 2.8: MARGINAL RELIABILITY OF TOTAL SUMMATIVE SCORES BY GROUP - MATHEMATICS
Grade	Group	N	Var	MSE	Rho
3	All	52,266	6621.8	345.2	0.948
	LEP Status	5,645	4265.5	384.9	0.910
	Section 504 Status	605	6297.7	386.3	0.939
	Economic Disadvantage Status	24,715	6026.8	357.4	0.941
	IDEA Indicator	7,203	6442.8	469.0	0.927
4	All	52,781	7229.3	366.5	0.949
	LEP Status	5,835	4822.4	452.2	0.906
	Section 504 Status	710	6135.0	403.8	0.934
	Economic Disadvantage Status	24,686	6944.5	396.1	0.943
	IDEA Indicator	7,428	6593.8	535.9	0.919
5	All	53,136	7962.6	568.6	0.929
	LEP Status	4,413	4124.3	864.1	0.790
	Section 504 Status	886	6581.4	595.6	0.910
	Economic Disadvantage Status	24,677	7676.8	646.4	0.916
	IDEA Indicator	7,373	6609.8	897.5	0.864
6	All	48,039	8368.5	655.0	0.922
	LEP Status	2,764	4831.5	1038.8	0.785
	Section 504 Status	974	8267.5	756.5	0.909
	Economic Disadvantage Status	19,370	7851.7	739.2	0.906
	IDEA Indicator	6,125	7182.0	1124.1	0.843
7	All	48,506	9038.2	838.6	0.907
	LEP Status	2,842	5368.6	1511.3	0.718
	Section 504 Status	1,101	9467.9	917.1	0.903
	Economic Disadvantage Status	22,144	8581.9	974.6	0.886
	IDEA Indicator	6,054	7183.6	1509.0	0.790
8	All	47,546	9968.1	1051.1	0.895
	LEP Status	2,765	4911.2	1611.8	0.672
	Section 504 Status	1,109	9950.7	1101.6	0.889
	Economic Disadvantage Status	21,261	9069.9	1209.8	0.867
	IDEA Indicator	5,692	7278.2	1634.2	0.775
HS	All	13,541	12465.0	1118.7	0.910
	LEP Status	193	6998.2	1766.7	0.748
	Section 504 Status	688	12224.9	1311.9	0.893
	Economic Disadvantage Status	3,225	11316.0	1272.7	0.888
	IDEA Indicator	1,525	7479.5	2001.7	0.732

2.3.3 Paper/Pencil Tests

Smarter Balanced supports fixed-form paper/pencil tests adherent to the full blueprint for use in a variety of situations, including schools that lack computer capacity and to address potential religious concerns associated with using technology for assessments. Scores on the paper/pencil tests are on the same reporting scale that is used for the online assessments. The forms used in the 2020-21 administration are collectively (for all grades) referred to as Form 5.

Table 2.9 and Table 2.10 show, for ELA/literacy and mathematics, respectively, statistical information pertaining to the items on Form 5 and to the measurement precision of the form. MSE estimates for the paper/pencil forms were based on Equation (2.5) through Equation (2.7), except that quadrature points and weights over a hypothetical theta distribution were used instead of observed scores (theta_hats). The hypothetical true score distribution used for quadrature was the student distribution from the 2014–2015 operational administration. Reliability was then computed as in Equation (2.4) with the observed-score variance equal to the MSE plus the variance of the hypothetical true score distribution. Reliability was better for the full test than for subscales and is inversely related to the SEM.

Table 2.9: RELIABILITY OF PAPER PENCIL TESTS, FORM 5 ENGLISH LANGUAGE ARTS/LITERACY
Grade	Nitems	Rho	SEM	Avg. b	Avg. a	C1 Rho	C1 SEM	C2 Rho	C2 SEM	C3 Rho	C3 SEM	C4 Rho	C4 SEM
3	41	0.917	0.305	-0.692	0.784	0.793	0.519	0.754	0.579	0.586	0.853	0.708	0.651
4	41	0.915	0.327	-0.138	0.724	0.800	0.537	0.762	0.600	0.604	0.868	0.645	0.795
5	41	0.920	0.320	0.250	0.732	0.792	0.555	0.767	0.598	0.583	0.916	0.711	0.691
6	40	0.912	0.330	0.743	0.704	0.692	0.709	0.784	0.559	0.630	0.815	0.675	0.739
7	39	0.912	0.346	0.799	0.654	0.765	0.619	0.740	0.662	0.649	0.821	0.646	0.827
8	43	0.915	0.337	1.148	0.631	0.736	0.662	0.774	0.596	0.659	0.795	0.686	0.747
11	42	0.930	0.345	1.249	0.672	0.806	0.619	0.789	0.652	0.672	0.881	0.728	0.771

Table 2.10: RELIABILITY OF PAPER PENCIL TEST, FORM 5 MATHEMATICS
Grade	Nitems	Rho	SEM	Avg. b	Avg. a	C1 Rho	C1 SEM	C2&4 Rho	C2&4 SEM	C3 Rho	C3 SEM
3	40	0.915	0.304	-0.975	0.832	0.833	0.448	0.738	0.596	0.630	0.766
4	40	0.923	0.293	-0.549	0.861	0.851	0.427	0.747	0.593	0.715	0.643
5	39	0.916	0.340	0.058	0.796	0.829	0.510	0.748	0.653	0.744	0.660
6	38	0.914	0.391	0.517	0.744	0.843	0.550	0.602	1.036	0.765	0.706
7	40	0.918	0.406	0.756	0.743	0.858	0.552	0.688	0.914	0.721	0.844
8	38	0.907	0.465	1.192	0.668	0.854	0.601	0.596	1.194	0.572	1.254
10	39	0.895	0.535	1.762	0.521	0.826	0.715	0.602	1.266	0.661	1.118
11	41	0.894	0.536	1.859	0.551	0.835	0.693	0.581	1.325	0.644	1.160

2.4 Classification Accuracy

Information on classification accuracy is based on actual test results from the 2020-21 administration. Classification accuracy is a measure of how accurately test scores or subscores place students into reporting category levels. The likelihood of inaccurate placement depends on the amount of measurement error associated with scores, especially those nearest cut points, and on the distribution of student achievement. For this report, classification accuracy was calculated in the following manner. For each examinee, analysts used the estimated scale score and its standard error of measurement to obtain a normal approximation of the likelihood function over the range of scale scores. The normal approximation took the scale score estimate as its mean and the standard error of measurement as its standard deviation. The proportion of the area under the curve within each level was then calculated.

Figure 2.1 illustrates the approach for one examinee in grade 11 mathematics. In this example, the examinee’s overall scale score is 2606 (placing this student in level 2, based on the cut scores for this grade level), with a standard error of measurement of 31 points. Accordingly, a normal distribution with a mean of 2606 and a standard deviation of 31 was used to approximate the likelihood of the examinee’s true level, based on the observed test performance. The area under the curve was computed within each score range in order to estimate the probability that the examinee’s true score falls within that level (the red vertical lines identify the cut scores). For the student in Figure 2.1, the estimated probabilities were 2.1% for level 1, 74.0% for level 2, 23.9% for level 3, and 0.0% for level 4. Since the student’s assigned level was level 2, there is an estimated 74% chance the student was correctly classified and a 26% (2.1% + 23.9% + 0.0%) chance the student was misclassified.

Figure 2.1: Illustrative Example of a Normal Distribution Used to Calculate Classification Accuracy

The same procedure was then applied to all students within the sample. Results are shown for 10 cases in Table 2.11 (student 6 is the case illustrated in Figure 2.1).

Table 2.11: ILLUSTRATIVE EXAMPLE OF CLASSIFICATION ACCURACY CALCULATION RESULTS
Student	SS	SEM	Level	P(L1)	P(L2)	P(L3)	P(L4)
1	2751	23	4	0	0	0.076	0.924
2	2375	66	1	0.995	0.005	0	0
3	2482	42	1	0.927	0.073	0	0
4	2529	37	1	0.647	0.349	0.004	0
5	2524	36	1	0.701	0.297	0.002	0
6	2606	31	2	0.021	0.74	0.239	0
7	2474	42	1	0.95	0.05	0	0
8	2657	26	3	0	0.132	0.858	0.009
9	2600	31	2	0.033	0.784	0.183	0
10	2672	23	3	0	0.028	0.949	0.023
<85>	<85>	<85>	<85>	<85>	<85>	<85>	<85>

Table 2.12 presents a hypothetical set of results for the overall score and for a claim score (claim 3) for a population of students. The number (N) and proportion (P) of students classified into each achievement level is shown in the first three columns. These are counts and proportions of “observed” classifications in the population. Students are classified into the four achievement levels by their overall score. By claim scores, students are classified as “below,” “near,” or “above” standard, where the standard is the level 3 cut score. Rules for classifying students by their claim scores are detailed in Chapter 7.

The next four columns (“Freq L1,” etc.) show the number of students by “true level” among students at a given “observed level.” The last four columns convert the frequencies by true level into proportions. The sum of proportions in the last four columns of the “Overall” section of the table equals 1.0. Likewise, the sum of proportions in the last four columns of the “Claim 3” section of the table equals 1.0. For the overall test, the proportions of correct classifications for this hypothetical example are .404, .180, .145, and .098 for levels 1-4, respectively.

Table 2.12: EXAMPLE OF CROSS-CLASSIFYING TRUE ACHIEVEMENT LEVEL BY OBSERVED ACHIEVEMENT LEVEL
Score	Observed Level	N	P	Freq L1	Freq L2	Freq L3	Freq L4	Prop L1	Prop L2	Prop L3	Prop L4
Overall	Level 1	251,896	0.451	225,454	26,172	263	8	0.404	0.047	0.000	0.000
	Level 2	141,256	0.253	21,800	100,364	19,080	11	0.039	0.180	0.034	0.000
	Level 3	104,125	0.186	161	14,223	81,089	8,652	0.000	0.025	0.145	0.015
	Level 4	61,276	0.110	47	29	6,452	54,748	0.000	0.000	0.012	0.098
Claim 3	Below Standard	167,810	0.300	143,536	18,323	4,961	990	0.257	0.033	0.009	0.002
	Near Standard	309,550	0.554	93,364	102,133	89,696	24,357	0.167	0.183	0.161	0.044
	Above Standard	81,193	0.145	94	1,214	18,949	60,936	0.000	0.002	0.034	0.109

For claim scores, correct “below” classifications are represented in cells corresponding to the “below standard” row and the levels 1 and 2 columns. Both levels 1 and 2 are below the level 3 cut score, which is the standard. Similarly, correct “above” standard classifications are represented in cells corresponding to the “above standard” row and the levels 3 and 4 columns. Correct classifications for “near” standard are not computed. There is no absolute criterion or scale score range, such as is defined by cut scores, for determining whether a student is truly at or near the standard. That is, the standard (level 3 cut score) clearly defines whether a student is above or below standard, but there is no range centered on the standard for determining whether a student is “near.”

Table 2.13 shows more specifically how the proportion of correct classifications is computed for classifications based on students’ overall and claim scores. For each type of score (overall and claim), the proportion of correct classifications is computed overall and conditionally on each observed classification (except for the “near standard” claim score classification). The conditional proportion correct is the proportion correct within a row divided by the total proportion within a row. For the overall score, the overall proportion correct is the sum of the proportions correct within the overall table section.

Table 2.13: EXAMPLE OF CORRECT CLASSIFICATION RATES
Score	Observed Level	P	Prop L1	Prop L2	Prop L3	Prop L4	Accuracy by level	Accuracy overall
Overall	Level 1	0.451	0.404	0.047	0.000	0.000	.404/.451=.895	(.404+.180+.145+.098)/1.000=.827
	Level 2	0.253	0.039	0.180	0.034	0.000	.180/.253=.711
	Level 3	0.186	0.000	0.025	0.145	0.015	.145/.186=.779
	Level 4	0.110	0.000	0.000	0.012	0.098	.098/.110=.893
Claim 3	Below Standard	0.300	0.257	0.033	0.009	0.002	(.257+.033)/.300=.965	(.257+.033+.034+.109)/(.300+.145)=.971
	Near Standard	0.554	0.167	0.183	0.161	0.044	NA
	Above Standard	0.145	0.000	0.002	0.034	0.109	(.034+.109)/.145=.984

For the claim score, the overall classification accuracy rate is based only on students whose observed achievement is “below standard” or “above standard.” That is, the overall proportion correct for classifications by claim scores is the sum of the proportions correct in the claim section of the table, divided by the sum of all of the proportions in the “above standard” and “below standard” rows.

The following two sections show classification accuracy statistics for ELA/literacy and mathematics. There are seven tables in each section—one for each grade 3-8 and high school (HS). The statistics in these tables were computed as described above.

2.4.1 English Language Arts/Literacy

Results in this section are based on real data from students who took the full blueprint. Table 2.14 through Table 2.20 show ELA/literacy classification accuracy for each grade 3-8 and high school (HS). Section 2.4 explains how the statistics in these tables were computed. Classification accuracy for each category was high to moderately high for all ELA/literacy grades.

Table 2.14: GRADE 3 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	17,859	0.337	0.302	0.035	0	0	0.895	0.8
	Level 2	13,655	0.258	0.036	0.187	0.035	0	0.724
	Level 3	11,459	0.217	0	0.036	0.15	0.03	0.693
	Level 4	9,954	0.188	0	0	0.026	0.162	0.86
Claim 1	Below	15,897	0.372	0.284	0.083	0.005	0	0.985	0.985
	Near	17,782	0.417	0.039	0.174	0.157	0.046
	Above	9,013	0.211	0	0.003	0.04	0.168	0.985
Claim 2	Below	16,743	0.392	0.316	0.066	0.009	0.001	0.974	0.972
	Near	18,685	0.438	0.07	0.163	0.14	0.065
	Above	7,264	0.170	0	0.006	0.031	0.134	0.966
Claim 3	Below	3,998	0.094	0.084	0.007	0.002	0.001	0.965	0.967
	Near	30,328	0.710	0.252	0.174	0.147	0.137
	Above	8,366	0.196	0.001	0.006	0.022	0.167	0.967
Claim 4	Below	15,336	0.359	0.307	0.045	0.006	0.001	0.981	0.982
	Near	19,981	0.468	0.09	0.17	0.142	0.066
	Above	7,375	0.173	0	0.003	0.025	0.144	0.983
Total:	All Students	52,927	1.000

Table 2.15: GRADE 4 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	19,307	0.362	0.327	0.034	0	0	0.904	0.792
	Level 2	11,005	0.206	0.036	0.135	0.035	0	0.652
	Level 3	12,100	0.227	0	0.038	0.154	0.034	0.681
	Level 4	10,968	0.205	0	0	0.029	0.176	0.859
Claim 1	Below	15,344	0.358	0.311	0.043	0.004	0	0.988	0.988
	Near	18,735	0.437	0.063	0.15	0.166	0.057
	Above	8,823	0.206	0	0.002	0.033	0.17	0.988
Claim 2	Below	16,604	0.387	0.325	0.051	0.01	0.001	0.971	0.968
	Near	18,485	0.431	0.081	0.133	0.138	0.079
	Above	7,813	0.182	0.001	0.006	0.029	0.146	0.963
Claim 3	Below	9,006	0.210	0.187	0.017	0.005	0.001	0.971	0.969
	Near	25,329	0.590	0.154	0.146	0.155	0.135
	Above	8,567	0.200	0.001	0.006	0.024	0.169	0.968
Claim 4	Below	13,840	0.323	0.283	0.032	0.006	0.001	0.977	0.978
	Near	20,959	0.489	0.118	0.146	0.144	0.081
	Above	8,103	0.189	0	0.004	0.026	0.159	0.98
Total:	All Students	53,380	1.000

Table 2.16: GRADE 5 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	17,978	0.335	0.304	0.03	0	0	0.908	0.807
	Level 2	11,340	0.211	0.032	0.145	0.034	0	0.687
	Level 3	15,041	0.280	0	0.035	0.212	0.032	0.758
	Level 4	9,373	0.174	0	0	0.029	0.146	0.835
Claim 1	Below	14,809	0.340	0.285	0.051	0.004	0	0.987	0.986
	Near	18,081	0.416	0.05	0.148	0.181	0.036
	Above	10,616	0.244	0	0.004	0.062	0.178	0.983
Claim 2	Below	17,168	0.395	0.339	0.048	0.007	0	0.982	0.978
	Near	18,747	0.431	0.078	0.14	0.163	0.05
	Above	7,591	0.174	0	0.005	0.042	0.127	0.97
Claim 3	Below	8,705	0.200	0.181	0.015	0.004	0	0.979	0.974
	Near	25,650	0.590	0.161	0.15	0.183	0.096
	Above	9,151	0.210	0.001	0.006	0.037	0.166	0.968
Claim 4	Below	7,580	0.174	0.159	0.013	0.002	0	0.987	0.987
	Near	26,443	0.608	0.192	0.177	0.192	0.047
	Above	9,483	0.218	0	0.003	0.048	0.167	0.987
Total:	All Students	53,732	1.000

Table 2.17: GRADE 6 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	13,739	0.279	0.246	0.033	0	0	0.881	0.794
	Level 2	14,390	0.293	0.039	0.214	0.04	0	0.731
	Level 3	14,861	0.302	0	0.042	0.231	0.03	0.764
	Level 4	6,190	0.126	0	0	0.023	0.103	0.82
Claim 1	Below	12,151	0.315	0.252	0.058	0.005	0	0.984	0.984
	Near	19,064	0.494	0.061	0.194	0.202	0.038
	Above	7,346	0.191	0	0.003	0.05	0.138	0.984
Claim 2	Below	14,387	0.373	0.292	0.075	0.006	0	0.984	0.981
	Near	18,251	0.473	0.045	0.202	0.196	0.031
	Above	5,923	0.154	0	0.004	0.051	0.098	0.973
Claim 3	Below	9,388	0.243	0.206	0.031	0.006	0.001	0.974	0.966
	Near	22,351	0.580	0.116	0.179	0.191	0.094
	Above	6,822	0.177	0.001	0.007	0.034	0.135	0.957
Claim 4	Below	9,549	0.248	0.207	0.035	0.006	0	0.976	0.978
	Near	20,150	0.523	0.094	0.178	0.198	0.052
	Above	8,862	0.230	0	0.005	0.067	0.158	0.98
Total:	All Students	49,180	1.000

Table 2.18: GRADE 7 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	12,276	0.247	0.218	0.03	0	0	0.88	0.795
	Level 2	13,313	0.268	0.036	0.192	0.041	0	0.715
	Level 3	17,665	0.356	0	0.043	0.281	0.032	0.788
	Level 4	6,359	0.128	0	0	0.023	0.105	0.819
Claim 1	Below	10,155	0.262	0.212	0.047	0.003	0	0.987	0.986
	Near	19,089	0.492	0.054	0.192	0.221	0.025
	Above	9,556	0.246	0	0.004	0.081	0.161	0.985
Claim 2	Below	9,752	0.251	0.207	0.039	0.004	0	0.982	0.978
	Near	20,501	0.528	0.064	0.199	0.232	0.033
	Above	8,547	0.220	0	0.006	0.081	0.133	0.973
Claim 3	Below	8,963	0.231	0.193	0.031	0.007	0	0.97	0.97
	Near	24,028	0.619	0.112	0.18	0.228	0.1
	Above	5,809	0.150	0.001	0.004	0.028	0.117	0.971
Claim 4	Below	8,196	0.211	0.184	0.022	0.005	0.001	0.975	0.978
	Near	21,915	0.565	0.126	0.179	0.216	0.044
	Above	8,689	0.224	0	0.004	0.071	0.149	0.981
Total:	All Students	49,613	1.000

Table 2.19: GRADE 8 ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	12,119	0.249	0.217	0.031	0	0	0.875	0.797
	Level 2	13,615	0.279	0.035	0.204	0.04	0	0.731
	Level 3	17,044	0.350	0	0.042	0.277	0.03	0.793
	Level 4	5,975	0.123	0	0	0.024	0.099	0.805
Claim 1	Below	10,328	0.269	0.209	0.055	0.005	0	0.982	0.984
	Near	19,165	0.500	0.052	0.199	0.225	0.025
	Above	8,846	0.231	0	0.003	0.081	0.146	0.985
Claim 2	Below	12,232	0.319	0.243	0.067	0.008	0	0.974	0.973
	Near	19,160	0.500	0.047	0.192	0.226	0.035
	Above	6,947	0.181	0	0.005	0.061	0.115	0.973
Claim 3	Below	9,016	0.235	0.194	0.033	0.007	0	0.968	0.967
	Near	22,273	0.581	0.108	0.177	0.213	0.083
	Above	7,050	0.184	0.001	0.006	0.04	0.137	0.965
Claim 4	Below	7,690	0.201	0.172	0.025	0.003	0	0.983	0.983
	Near	22,334	0.583	0.112	0.201	0.228	0.041
	Above	8,315	0.217	0	0.004	0.071	0.142	0.983
Total:	All Students	48,753	1.000

Table 2.20: HIGH SCHOOL ELA/LITERACY CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	2,314	0.173	0.151	0.022	0	0	0.872	0.788
	Level 2	2,904	0.217	0.026	0.155	0.036	0	0.714
	Level 3	4,930	0.368	0	0.043	0.281	0.044	0.764
	Level 4	3,232	0.242	0	0	0.041	0.201	0.831
Claim 1	Below	2,729	0.205	0.161	0.039	0.004	0	0.98	0.982
	Near	6,302	0.472	0.05	0.174	0.211	0.037
	Above	4,307	0.323	0	0.005	0.094	0.223	0.984
Claim 2	Below	2,301	0.173	0.143	0.026	0.003	0	0.98	0.98
	Near	6,769	0.507	0.057	0.17	0.222	0.058
	Above	4,268	0.320	0	0.006	0.082	0.231	0.98
Claim 3	Below	1,811	0.136	0.115	0.016	0.004	0	0.97	0.97
	Near	8,623	0.646	0.122	0.176	0.214	0.134
	Above	2,904	0.218	0.001	0.006	0.033	0.179	0.971
Claim 4	Below	2,178	0.163	0.136	0.024	0.004	0	0.976	0.978
	Near	7,282	0.546	0.088	0.167	0.212	0.079
	Above	3,878	0.291	0	0.006	0.068	0.217	0.98
Total:	All Students	13,380	1.000

2.4.2 Mathematics

Results in this section are based on real data from students who took the full blueprint. Table 2.21 through Table 2.27 show the classification accuracy of the mathematics assessment for each grade 3-8 and high school (HS). Section 2.4 explains how the statistics in these tables were computed. Classification accuracy for each category was high to moderately high for all mathematics grades.

Table 2.21: GRADE 3 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	17,212	0.328	0.309	0.018	0	0	0.944	0.831
	Level 2	13,639	0.260	0.049	0.185	0.025	0	0.714
	Level 3	12,796	0.244	0	0.033	0.195	0.015	0.802
	Level 4	8,901	0.169	0	0	0.028	0.141	0.834
Claim 1	Below	9,015	0.358	0.273	0.081	0.003	0	0.991	0.99
	Near	8,921	0.354	0.019	0.163	0.164	0.008
	Above	7,263	0.288	0	0.003	0.097	0.188	0.989
Claim 2/4	Below	6,808	0.270	0.219	0.044	0.006	0.001	0.974	0.979
	Near	11,898	0.472	0.066	0.18	0.197	0.029
	Above	6,493	0.258	0	0.004	0.075	0.179	0.984
Claim 3	Below	6,200	0.246	0.208	0.031	0.007	0.001	0.97	0.977
	Near	13,315	0.528	0.113	0.18	0.193	0.042
	Above	5,684	0.226	0	0.003	0.059	0.163	0.985
Total:	All Students	52,548	1.000

Table 2.22: GRADE 4 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	17,219	0.324	0.292	0.032	0	0	0.901	0.833
	Level 2	13,935	0.262	0.024	0.218	0.02	0	0.832
	Level 3	12,809	0.241	0	0.043	0.185	0.014	0.767
	Level 4	9,142	0.172	0	0	0.034	0.138	0.8
Claim 1	Below	10,038	0.392	0.251	0.138	0.003	0	0.992	0.991
	Near	8,738	0.341	0.004	0.171	0.158	0.007
	Above	6,839	0.267	0	0.003	0.096	0.168	0.989
Claim 2/4	Below	8,030	0.313	0.236	0.071	0.006	0	0.98	0.982
	Near	12,390	0.484	0.045	0.214	0.186	0.038
	Above	5,195	0.203	0	0.003	0.053	0.146	0.984
Claim 3	Below	8,207	0.320	0.24	0.073	0.007	0.001	0.976	0.979
	Near	12,056	0.471	0.048	0.21	0.18	0.033
	Above	5,352	0.209	0	0.003	0.058	0.148	0.985
Total:	All Students	53,105	1.000

Table 2.23: GRADE 5 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	19,536	0.366	0.339	0.027	0	0	0.925	0.806
	Level 2	13,368	0.250	0.046	0.186	0.018	0	0.743
	Level 3	12,189	0.228	0	0.062	0.142	0.024	0.622
	Level 4	8,300	0.155	0	0	0.016	0.139	0.895
Claim 1	Below	11,966	0.467	0.355	0.108	0.004	0	0.991	0.99
	Near	8,534	0.333	0.016	0.168	0.125	0.024
	Above	5,119	0.200	0	0.002	0.043	0.155	0.988
Claim 2/4	Below	8,985	0.351	0.277	0.063	0.008	0.003	0.968	0.973
	Near	12,502	0.488	0.074	0.209	0.149	0.057
	Above	4,132	0.161	0	0.003	0.03	0.129	0.984
Claim 3	Below	9,248	0.361	0.294	0.058	0.007	0.001	0.975	0.977
	Near	12,676	0.495	0.082	0.203	0.141	0.068
	Above	3,695	0.144	0	0.003	0.021	0.12	0.982
Total:	All Students	53,393	1.000

Table 2.24: GRADE 6 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	15,308	0.318	0.292	0.026	0	0	0.919	0.795
	Level 2	15,372	0.319	0.057	0.238	0.024	0	0.747
	Level 3	11,592	0.240	0	0.057	0.155	0.028	0.646
	Level 4	5,932	0.123	0	0	0.013	0.11	0.893
Claim 1	Below	11,455	0.445	0.327	0.113	0.004	0	0.991	0.99
	Near	9,752	0.378	0.015	0.193	0.145	0.026
	Above	4,559	0.177	0	0.002	0.041	0.134	0.986
Claim 2/4	Below	9,471	0.368	0.29	0.067	0.008	0.002	0.972	0.975
	Near	12,551	0.487	0.059	0.218	0.158	0.051
	Above	3,744	0.145	0	0.003	0.029	0.114	0.982
Claim 3	Below	8,346	0.324	0.264	0.05	0.008	0.002	0.971
	Near	13,600	0.528	0.105	0.209	0.149	0.064
	Above	3,820	0.148	0	0.002	0.026	0.119	0.983
Total:	All Students	48,204	1.000

Table 2.25: GRADE 7 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	13,963	0.287	0.253	0.034	0	0	0.88	0.793
	Level 2	14,631	0.301	0.043	0.228	0.03	0	0.756
	Level 3	13,581	0.279	0	0.052	0.192	0.035	0.687
	Level 4	6,459	0.133	0	0	0.012	0.121	0.909
Claim 1	Below	10,673	0.407	0.302	0.1	0.004	0	0.989	0.989
	Near	9,880	0.376	0.018	0.191	0.153	0.015
	Above	5,691	0.217	0	0.002	0.064	0.15	0.989
Claim 2/4	Below	7,875	0.300	0.241	0.049	0.008	0.002	0.966	0.973
	Near	13,567	0.517	0.096	0.21	0.165	0.045
	Above	4,802	0.183	0	0.003	0.042	0.139	0.985
Claim 3	Below	5,933	0.226	0.193	0.025	0.006	0.002	0.966	0.972
	Near	16,553	0.631	0.174	0.212	0.168	0.077
	Above	3,758	0.143	0	0.002	0.025	0.115	0.982
Total:	All Students	48,634	1.000

Table 2.26: GRADE 8 MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	15,121	0.317	0.283	0.033	0	0	0.894	0.768
	Level 2	13,850	0.290	0.059	0.199	0.031	0	0.687
	Level 3	12,368	0.259	0	0.056	0.164	0.038	0.635
	Level 4	6,410	0.134	0	0	0.013	0.121	0.903
Claim 1	Below	11,027	0.428	0.34	0.083	0.005	0	0.989	0.989
	Near	9,703	0.376	0.034	0.179	0.138	0.026
	Above	5,044	0.196	0	0.002	0.042	0.152	0.989
Claim 2/4	Below	6,400	0.248	0.21	0.03	0.006	0.003	0.964	0.974
	Near	14,480	0.562	0.149	0.199	0.159	0.055
	Above	4,894	0.190	0	0.003	0.038	0.15	0.987
Claim 3	Below	7,001	0.272	0.233	0.03	0.006	0.002	0.969	0.975
	Near	14,845	0.576	0.166	0.186	0.145	0.079
	Above	3,928	0.152	0	0.002	0.024	0.126	0.985
Total:	All Students	47,749	1.000

Table 2.27: HIGH SCHOOL MATHEMATICS CLASSIFICATION ACCURACY
Score	Observed Level	N	P	True L1	True L2	True L3	True L4	Accuracy by Level	Accuracy Overall
Overall	Level 1	4,752	0.351	0.311	0.039	0	0	0.887	0.808
	Level 2	3,943	0.291	0.041	0.21	0.04	0	0.722
	Level 3	3,223	0.238	0	0.034	0.182	0.022	0.763
	Level 4	1,622	0.120	0	0	0.015	0.105	0.873
Claim 1	Below	6,249	0.462	0.355	0.103	0.005	0	0.99	0.99
	Near	4,791	0.354	0.025	0.178	0.141	0.011
	Above	2,478	0.183	0	0.002	0.07	0.112	0.99
Claim 2/4	Below	2,819	0.209	0.177	0.024	0.006	0.002	0.963	0.975
	Near	7,738	0.572	0.15	0.192	0.183	0.048
	Above	2,961	0.219	0	0.003	0.058	0.159	0.987
Claim 3	Below	3,131	0.232	0.2	0.024	0.006	0.001	0.97	0.975
	Near	8,344	0.617	0.18	0.188	0.175	0.075
	Above	2,043	0.151	0	0.003	0.032	0.116	0.983
Total:	All Students	13,540	1.000

2.5 Standard Errors of Measurement (SEMs)

The standard error of measurement (SEM) information in this section is based on student scores and associated SEMs included in the data Smarter Balanced receives from members after the (2020-21) administration. Student scores and SEMs are not computed directly by Smarter Balanced. They are computed by service providers who deliver the test according to the scoring specifications provided by Smarter Balanced. These include the use of Equation (2.6) in this chapter for computing SEMs. According to this equation, and the adaptive nature of the test, different students receive different items. The amount of measurement error will therefore vary from student to student, even among students with the same estimate of achievement.

All of the SEM statistics reported in this chapter are based on the full blueprint and are in the reporting scale metric. For member data that includes SEMs in the theta metric exclusively, the SEMs are transformed to the reporting metric using the multiplication factors in the theta-to-scale-score transformation given in Chapter 5. Please remember that ELA/literacy and mathematics are not in the same metric and keep in mind that a small and non-representative sample of enrolled students tested in the 2020-21 administration year.

Table 2.28 and Table 2.29 show the trend in the SEM by student decile for ELA/literacy and mathematics, respectively. Deciles were defined by ranking students from highest to lowest scale score and dividing the students into 10 equal-sized groups according to rank. Decile 1 contains the 10% of students with the lowest scale scores. Decile 10 contains the 10% of students with the highest scale scores. The SEM reported for a decile is the average SEM among examinees at that decile.

Table 2.28: MEAN OVERALL SEM AND CONDITIONAL SEMS BY DECILE, ELA/LITERACY
Subject	Grade	Mean SEM	1	2	3	4	5	6	7	8	9	10
ELA	3	24.4	31.6	25.7	24.1	23.3	22.9	22.6	22.6	22.7	23.1	25.2
	4	26.0	32.2	26.6	25.6	25.3	25.1	24.7	24.3	24.2	24.6	27.3
	5	25.3	30.1	24.8	23.8	23.6	23.8	24.0	24.3	24.9	25.7	28.3
	6	26.9	34.0	27.5	25.7	25.2	25.1	25.4	25.5	25.9	26.5	28.6
	7	27.9	36.3	28.8	26.8	26.1	26.1	26.2	26.2	26.6	26.9	28.9
	8	28.4	36.6	29.6	27.8	27.1	26.6	26.3	26.3	26.6	27.4	29.6
	HS	31.8	40.1	33.0	30.9	30.1	30.0	29.9	30.0	30.3	30.9	33.0

Table 2.29: MEAN OVERALL SEM AND CONDITIONAL SEMS BY DECILE, MATHEMATICS
Subject	Grade	Mean SEM	1	2	3	4	5	6	7	8	9	10
MATH	3	18.1	24.5	20.4	18.8	17.7	17.0	16.7	16.4	16.1	16.1	17.2
	4	18.6	26.4	21.7	19.4	18.4	17.6	17.0	16.5	16.1	15.9	16.4
	5	22.7	34.2	28.8	26.1	24.1	22.1	20.5	19.1	17.9	17.1	16.7
	6	24.5	38.4	29.9	26.5	24.5	23.2	22.2	21.2	20.2	19.4	19.1
	7	27.3	44.7	34.8	30.9	28.3	26.1	24.4	22.8	21.3	20.0	19.1
	8	31.2	44.9	38.2	35.5	33.4	31.6	29.7	27.6	25.5	23.2	21.7
	HS	32.0	52.9	41.0	36.1	33.0	30.8	29.0	27.2	25.3	23.1	21.2

Table 2.30 and Table 2.31 show the average SEM near the achievement level cut scores. In the table, Mn is Mean and SD is Standard Deviation.

The average SEM reported for a given cut score is the average SEM among students within 10 scale score units of the cut score. In the column headings, “Cut1” is the lowest cut score defining the lower boundary of level 2, “Cut2” defines the lower boundary of level 3, and “Cut3” defines the lower boundary of level 4.

Table 2.30: CONDITIONAL SEM NEAR (±10 POINTS) ACHIEVEMENT LEVEL CUT SCORES, ELA/LITERACY
Grade	Cut1v2 N	Cut1v2 Mn	Cut1v2 SD	Cut2v3 N	Cut2v3 Mn	Cut2v3 SD	Cut3v4 N	Cut3v4 Mn	Cut3v4 SD
3	4,065	23.37	2.20	4,449	22.59	2.51	3,423	22.87	2.55
4	4,010	25.29	2.64	4,284	24.63	2.72	3,674	24.32	2.51
5	3,738	23.56	2.46	4,094	23.94	2.42	3,461	25.48	2.80
6	3,693	25.40	2.18	4,343	25.35	3.25	2,543	26.72	3.07
7	3,214	26.83	2.99	4,359	26.05	3.00	2,718	26.94	2.71
8	3,074	27.84	2.10	4,138	26.33	2.53	2,557	27.60	2.14
HS	558	32.31	2.55	955	30.05	1.51	1,010	30.40	1.68

Table 2.31: CONDITIONAL SEM NEAR (±10 POINTS) OF ACHIEVEMENT LEVEL CUT SCORES, MATHEMATICS
Grade	Cut1v2 N	Cut1v2 Mn	Cut1v2 SD	Cut2v3 N	Cut2v3 Mn	Cut2v3 SD	Cut3v4 N	Cut3v4 Mn	Cut3v4 SD
3	4,469	17.66	2.47	4,851	16.54	1.68	3,300	16.08	1.50
4	4,114	18.62	2.65	4,682	16.73	1.60	3,352	15.91	1.44
5	4,037	23.27	4.26	4,218	18.98	3.19	3,241	17.12	2.34
6	3,923	24.53	3.57	3,936	20.96	2.39	2,607	19.23	2.09
7	3,442	29.67	6.41	4,064	23.33	3.51	2,682	20.00	2.58
8	3,348	33.55	5.11	3,518	27.75	3.76	2,482	23.22	3.40
HS	898	32.92	3.69	994	27.41	3.08	536	23.20	3.36

Figure 2.2 to Figure 2.15 are scatter plots of a random sample of 2,000 individual student SEMs as a function of scale score for the total test and claims/subscores by grade within subject. These plots show the variability of SEMs among students with the same scale score as well as the trend in SEM with student achievement (scale score). In comparison to the total score, a claim score has greater measurement error and variability among students due to the fact that the claim score is based on a smaller number of items. Among claims, those representing fewer items will have higher measurement error and greater variability of measurement error than those representing more items.

Dashed vertical lines in Figure 2.2 to Figure 2.15 represent the achievement level cut scores. The plots for the high school standard errors show cut scores for each grade 9, 10, and 11, separately.

Figure 2.2: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 3

Figure 2.3: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 4

Figure 2.4: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 5

Figure 2.5: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 6

Figure 2.6: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 7

Figure 2.7: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy Grade 8

Figure 2.8: Students’ Standard Error of Measurement by Scale Score, ELA/Literacy High School

Figure 2.9: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 3

Figure 2.10: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 4

Figure 2.11: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 5

Figure 2.12: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 6

Figure 2.13: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 7

Figure 2.14: Students’ Standard Error of Measurement by Scale Score, Mathematics Grade 8

Figure 2.15: Students’ Standard Error of Measurement by Scale Score, Mathematics High School

All of the tables and figures in this section, for every grade and subject, show a trend of higher measurement error for lower-achieving students. This trend reflects the fact that the item pool is difficult in comparison to overall student achievement. The computer adaptive test (CAT) algorithm still delivers easier items to lower-achieving students than they would typically receive in a non-adaptive test, or in a fixed form where difficulty is similar to that of the item pool as a whole. But low-achieving students still tend to receive items that are relatively more difficult for them. Typically, this is because the CAT algorithm does not have easier items available within the blueprint constraints that must be met for all students.

2020-21 Summative Technical Report