Chapter 2 Reliability, Precision, and Errors of Measurement

2.1 Introduction

This chapter addresses the technical quality of the interim assessments available in the 2020-21 school year and the functioning of these assessments in terms of expected precision, accuracy, and reliability. Part of the test validity argument is that scores must be consistent and precise enough to be useful for intended purposes. If scores are to be meaningful, tests should deliver the same results for students of the same ability regardless of the specific items each student takes. Interim assessments are fixed forms, but each form is a sample of the same content sampled by the summative assessments, and this allows interim results to be placed on the same scale as summative results. In addition, the range of uncertainty around the score should be small enough to support educational decisions.

Because states do not routinely collect or report item responses on the interim assessments, estimates of precision and reliability are based on statistical attributes of the test items and test forms under the assumption that the interim assessments are administered to groups of students similar to those taking the 2020-21 summative assessment. It is also important to note that although a test’s reliability is sometimes characterized as yielding similar results over hypothetically repeated administrations, in practice, a student’s test performance may improve over repeated administrations due solely to the student becoming more familiar with the test, especially if the same items are used. For interim assessments, it is not possible to give the test more than once to the same student without the experience affecting their performance on the test. For this reason, only first-time test results from interim assessments are comparable among students. The results of repeated administration of the same interim assessment should be interpreted with caution with regard to measuring student growth. Also, interim results are comparable among only those students who are assessed at the same point in time relative to instruction on the knowledge and skill areas represented by the test—usually either just before or just after instruction.

2.2 Precision and Reliability

This section presents the methodology used to compute the precision and reliability of student scale scores on the interim assessments and summarizes results for overall scale scores on the ICAs. The methodology is also relevant to computing scale scores on IABs and FIABs and on reporting categories (claims) of the ICAs, which may be represented in student reports. But student performance on the IABs/FIABs and ICA claims is represented primarily by classifications into performance categories. Section 2.3 explains classifications into performance categories, presents the methodology for computing the accuracy and consistency of such classifications, and presents associated summaries for IABs, FIABs, and ICA claim scores.

A test’s precision is represented by its measurement error, which is called the standard error of measurement (SEM) for an individual student. The SEM for a given student depends on the student’s achievement score. This dependence gives rise to the notation SEM(\(\theta_{i}\)), which means “the SEM for a student whose achievement is represented by the quantity \(\theta_{i}\),” where \(i\) is a number representing the student. The \(\theta\)–scale is an item response theory (IRT) scale and generally ranges from -4 (extremely low achievement) to +4 (extremely high achievement) with a mean of zero. Ultimately, measures of achievement and SEMs on the \(\theta\) scale are transformed to the reporting scale as described in the Smarter Balanced Scoring Specifications (https://technicalreports.smarterbalanced.org/scoring_specs/_book/scoringspecs.html).

The formula for the SEM for student \(i\) whose achievement estimate is \(\hat\theta_{i}\), is:

\[\begin{equation} SEM(\hat{\theta_i}) = \frac{1}{\sqrt{I(\hat{\theta_i})}}, \tag{2.1} \end{equation}\]

where \(I(\theta_{i})\) is the test information for student \(i\), which is based on the items taken by the student and calculated as:

\[\begin{equation} \begin{split} I(\hat{\theta}_{i}) = \sum_{j=1}^{I}D^2a_{j}^2 (\frac{\sum_{l=1}^{m_{j}}l^2Exp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))} {1+\sum_{l=1}^{m_{j}}Exp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))} - \\ (\frac{\sum_{l=1}^{m_{j}}lExp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))} {1+\sum_{l=1}^{m_{j}}Exp(\sum_{k=1}^{l}Da_{j}(\hat{\theta}-b_{jk}))})^2), \end{split} \tag{2.2} \end{equation}\]

where \(m_j\) is the maximum possible score point (starting from 0) for the \(j\)th item, and \(D\) is the scale factor, 1.7. Values of \(a_j\) and \(b_jk\) are item parameters for item \(j\) and score level \(k\).

Figure 2.1 to Figure 2.6 show the \(I(\theta_{i})\) and SEM(\(\theta_{i}\)) for ICAs in each content area and grade, conditional on student proficiency over the range of -4 to +4. The shading in these plots shows the distribution of \(\theta\) in the population of students for each content area and grade. The means and standard deviations of these population \(\theta\) distributions are shown in Table 2.1. The high school (HS) values are used for grades 9, 10, and 11.

Table 2.1: PROFICIENCY POPULATION PARAMETERS
Grade	ELA/Literacy Mean	ELA/Literacy SD	Mathematics Mean	Mathematics SD
3	-0.908	1.030	-1.067	1.115
4	-0.437	1.074	-0.557	1.162
5	-0.085	1.133	-0.177	1.226
6	0.123	1.207	0.048	1.306
7	0.362	1.285	0.307	1.391
8	0.534	1.363	0.494	1.475
HS	0.858	1.461	0.844	1.580

Figure 2.1: Test Information Functions and SEM For ELA/Literacy ICA, Grades 3, 4, and 5

Figure 2.2: Test Information Functions and SEM For ELA/Literacy ICA, Grades 6, 7, and 8

Figure 2.3: Test Information Functions and SEM For ELA/Literacy ICA, Grade 9, 10, and 11

Figure 2.4: Test Information Functions and SEM For Mathematics ICA, Grades 3, 4, and 5

Figure 2.5: Test Information Functions and SEM For Mathematics ICA, Grades 6, 7, and 8

Figure 2.6: Test Information Functions and SEM For Mathematics ICA, Grade 9, 10, and 11

The measurement precision of the ICAs for students having the \(\theta\) distributions represented in Table 2.1 (and illustrated in Figure 2.1 to Figure 2.6) is represented by the marginal reliability coefficient and the root mean squared error (RMSE). These indices are shown in Table 2.2 (ELA/literacy) and Table 2.3 (mathematics). The reliability coefficient is:

\[\begin{equation} \hat{\rho} = 1 - \frac{MSE}{var(\hat{\theta})}, \tag{2.3} \end{equation}\]

where \(var(\hat{\theta})\) is the population variance of true scores. The square of the SD in Table 2.1 was used for \(var(\hat{\theta})\). The MSE is explained below. The reliability of a test is partly a function of its precision and partly a function of true differences in ability among students. A reliability coefficient of 0 indicates that measured differences among students are completely unreliable. A reliability coefficient of 1 indicates that the measured differences among students are completely reliable.

The reliability coefficient, \(\hat{\rho}\), was calculated for the overall score and claim scores for the ICAs. Reliability and RMSE are not reported for the IABs or FIABs because scale scores are not reported. IAB and FIAB results are reported in terms of whether the student is below, near, or above standard, where the level 3 cut score is the standard. For computing the reliability of claim scores, student measures of true “claim” achievement were assumed to have the same \(\theta\) distribution as overall student achievement.

The mean squared error (MSE), is the average of (\([SEM(\theta_{i})]^{2}\)) for a given \(\theta\) distribution. The RMSE is the square root of this average. Simulation was used to estimate the MSE. For each grade within subject, true \(\theta\) values for 1,000 examinees were simulated from a normal distribution with mean and SD equal to the values shown in Table 2.1. Then for each value of true \(\theta\), scores for the items on the test under study were generated using the test items’ IRT parameters. An estimate of \(\theta\) was then obtained via maximum likelihood estimation and the generated score vector. The data simulation and scoring were carried out with flexMIRT software (Cai, 2017). For a given test, the MSE was then estimated as:

\[\begin{equation} MSE = N^{-1}\sum_{i=1}^N (\hat{\theta_i}-\theta_i)^2, \tag{2.4} \end{equation}\]

where N=the number of simulated examinees (1,000).

Reliability and RMSE results for the 2020-21 school year are provided in Table 2.2 and Table 2.3. As expected, reliability coefficients for the ICA are high and the RMSEs are small and in the acceptable range for a large-scale test. Reliability estimates are lower and RMSE is higher for the ICA claim-level scores than for the overall scores. Claims with fewer items and fewer points exhibit the lowest reliability and the highest RMSE.

Table 2.2: RELIABILITY AND PRECISION FOR INTERIM COMPREHENSIVE ASSESSMENTS, ELA/LITERACY
Grade	Full Test Nitems	Full Test Reliability	Full Test RMSE	Claim 1 Reliability	Claim 1 RMSE	Claim 2 Reliability	Claim 2 RMSE	Claim 3 Reliability	Claim 3 RMSE	Claim 4 Reliability	Claim 4 RMSE
3	40	0.92	0.31	0.78	0.55	0.78	0.56	0.65	0.76	0.67	0.72
4	42	0.91	0.33	0.79	0.55	0.72	0.67	0.58	0.91	0.69	0.71
5	42	0.92	0.33	0.81	0.55	0.74	0.68	0.66	0.82	0.73	0.68
6	43	0.92	0.34	0.80	0.60	0.79	0.63	0.64	0.90	0.72	0.75
7	43	0.93	0.36	0.84	0.57	0.79	0.66	0.61	1.02	0.67	0.90
8	41	0.93	0.36	0.82	0.64	0.79	0.71	0.57	1.17	0.75	0.79
9	41	0.93	0.40	0.84	0.65	0.76	0.83	0.68	1.01	0.71	0.93
10	41	0.93	0.40	0.84	0.65	0.76	0.83	0.68	1.01	0.71	0.93
11	41	0.93	0.40	0.84	0.65	0.76	0.83	0.68	1.01	0.71	0.93

Table 2.3: RELIABILITY AND PRECISION FOR INTERIM COMPREHENSIVE ASSESSMENTS, MATHEMATICS
Grade	Full Test N items	Full Test Reliability	Full Test RMSE	Claim 1 Reliability	Claim 1 RMSE	Claim 2 Reliability	Claim 2 RMSE	Claim 3 Reliability	Claim 3 RMSE
3	37	0.93	0.31	0.87	0.44	0.69	0.76	0.72	0.70
4	36	0.92	0.33	0.86	0.46	0.74	0.70	0.60	0.94
5	37	0.92	0.37	0.84	0.54	0.69	0.82	0.73	0.74
6	36	0.92	0.39	0.86	0.54	0.65	0.96	0.67	0.91
7	37	0.91	0.43	0.85	0.59	0.72	0.87	0.67	0.98
8	37	0.92	0.45	0.85	0.62	0.75	0.85	0.67	1.04
9	37	0.92	0.48	0.86	0.64	0.69	1.07	0.67	1.11
10	37	0.92	0.48	0.86	0.64	0.69	1.07	0.67	1.11
11	38	0.91	0.49	0.84	0.69	0.73	0.97	0.64	1.19

2.3 Classification Accuracy

Classification accuracy is defined as the degree of consistency between the observed achievement level (from the observed scores) and the true achievement level (from the population distribution). To calculate the classification accuracy, a simulation study was carried out using item-level information and information about the population parameters (mean and standard deviation). The simulation study allows us to understand classification accuracy without having student-level data at hand. First, true scores for 1,000 simulees were generated from the mean and standard deviations shown in Table 2.1. Then, responses from the simulees to the items in the fixed forms (IABs, FIABs and ICAs) were generated using the parameters and item response models used in the scoring of these items. From these simulated item responses, scale scores, standard errors, and achievement-level classifications were obtained, according to the Smarter Balanced Scoring Specifications. Correct classification by level was computed as the proportion of students among those assigned to a particular level whose true achievement level (based on the simulated true score) and assigned achievement level (based on the estimated score) matched. The overall correct classification rate is the proportion of students among those assigned to any level who are correctly assigned. For the claim scores, IABs and FIABs, we assume that the true claim or IAB/FIAB scores are equivalent to the true overall scores. Therefore, we use the true overall score as the true claim or IAB/FIAB score in calculating correct classification rates.

For overall scores, we used a weighted Kappa to describe the accuracy of classifications into the four achievement levels. Claim and IAB/FIAB scores were evaluated with respect to the cut score between levels 2 and 3, which represents the minimum standard for proficiency for the subject and grade level. For each claim, students are classified as “above” or “below” the standard when the estimated score is at least 1.5 standard errors above or below the cut score. When the estimated score is within 1.5 standard errors, the student is classified as “near” the standard. Claim or IAB/FIAB scores with larger average standard errors can thus be expected to have a greater proportion of students classified as “near” the standard. Because such classifications cannot be treated as a misclassification (“near” is only defined in terms of the standard error of measurement), the proportions correctly classified focus on those students who were classified as “above” or “below.”

Table 2.4 shows the cut scores used for classifying examinees into achievement levels based on their overall test performance. The level 2 versus 3 cut score is also used to classify students by their performance on items specific to a claim—by their claim scores—and their performance on the IABs/FIABs.

Table 2.4: CUT SCORES FOR ACHIEVEMENT LEVELS
Grade	Subject	Level1v2	Level2v3	Level3v4
3	ELA/literacy	-1.646	-0.888	-0.212
4	ELA/literacy	-1.075	-0.410	0.289
5	ELA/literacy	-0.772	-0.072	0.860
6	ELA/literacy	-0.597	0.266	1.280
7	ELA/literacy	-0.340	0.510	1.641
8	ELA/literacy	-0.247	0.685	1.862
9	ELA/literacy	-0.224	0.732	1.909
10	ELA/literacy	-0.200	0.802	1.979
11	ELA/literacy	-0.177	0.872	2.026
3	Mathematics	-1.689	-0.995	-0.175
4	Mathematics	-1.310	-0.377	0.430
5	Mathematics	-0.755	0.165	0.808
6	Mathematics	-0.528	0.468	1.199
7	Mathematics	-0.390	0.657	1.515
8	Mathematics	-0.137	0.897	1.741
9	Mathematics	0.026	1.086	2.032
10	Mathematics	0.228	1.250	2.296
11	Mathematics	0.354	1.426	2.561

Table 2.5 and Table 2.6 show the simulated classification accuracy for the IAB scores in ELA/literacy and mathematics for all grades. Each table shows the proportion of simulees assigned to each category with respect to the level 3 cut score standard and the proportion among those assigned to each category and overall who were correctly classified. For ELA/literacy, classifications were highly accurate except for those into ‘below standard’ for blocks with larger standard errors (Brief Write and Performance Task), apparent from the high proportion of students assigned to the ‘near standard’ category. For math, all simulated classifications were highly accurate.

Table 2.7 and Table 2.8 show the classification accuracy for the FIAB scores in ELA/literacy and mathematics for all grades. All of the simulated classifications for both subjects were highly accurate.

Table 2.5: OVERALL LEVEL CLASSIFICATION ACCURACY FOR INTERIM ASSESSMENT BLOCKS, ELA/LITERACY
Grade	Block Name	Prop Assigned Below Standard	Prop Assigned Near Standard	Prop Assigned Above Standard	Prop Correctly Clasified Below Standard	Prop Correctly Classified Above Standard
3	Brief Writes	0.000	0.945	0.055	NA	1.000
	Performance Task	0.000	0.966	0.034	NA	1.000
	Read Informational Texts	0.230	0.521	0.249	0.983	0.996
	Read Literary Texts	0.258	0.429	0.313	0.984	0.997
	Research	0.257	0.478	0.265	0.969	0.996
	Revision	0.244	0.485	0.271	0.963	0.996
4	Brief Writes	0.263	0.715	0.022	0.734	1.000
	Performance Task	0.000	0.932	0.068	NA	1.000
	Read Informational Texts	0.192	0.543	0.265	0.979	0.985
	Read Literary Texts	0.276	0.473	0.251	0.975	0.992
	Research	0.253	0.488	0.259	0.980	0.992
	Revision	0.217	0.574	0.209	0.963	1.000
5	Brief Writes	0.241	0.759	0.000	0.714	NA
	Performance Task	0.253	0.715	0.032	0.834	1.000
	Read Informational Texts	0.153	0.614	0.233	0.961	0.991
	Read Literary Texts	0.173	0.561	0.266	0.965	1.000
	Research	0.274	0.429	0.297	0.978	0.993
	Revision	0.244	0.506	0.250	0.975	0.992
6	Brief Writes	0.051	0.925	0.024	0.765	1.000
	Performance Task	0.267	0.712	0.021	0.730	1.000
	Read Informational Texts	0.288	0.484	0.228	0.983	0.982
	Read Literary Texts	0.267	0.539	0.194	0.966	0.974
	Research	0.288	0.423	0.289	0.990	0.979
	Revision	0.253	0.557	0.190	0.988	0.984
7	Brief Writes	0.269	0.731	0.000	0.755	NA
	Performance Task	0.232	0.734	0.034	0.707	1.000
	Read Informational Texts	0.308	0.454	0.238	0.977	0.996
	Read Literary Texts	0.306	0.446	0.248	0.990	0.988
	Research	0.220	0.537	0.243	0.986	0.967
	Revision	0.275	0.495	0.230	0.971	0.970
8	Brief Writes	0.224	0.776	0.000	0.754	NA
	Edit/Revise	0.307	0.486	0.207	0.984	0.976
	Performance Task	0.378	0.615	0.007	0.680	1.000
	Read Informational Texts	0.227	0.508	0.265	0.978	0.985
	Read Literary Texts	0.316	0.433	0.251	0.987	0.980
	Research	0.272	0.469	0.259	0.982	0.985
11	Brief Writes	0.349	0.651	0.000	0.656	NA
	Performance Task	0.155	0.845	0.000	0.755	NA
	Read Informational Texts	0.227	0.527	0.246	0.987	0.996
	Read Literary Texts	0.250	0.483	0.267	0.980	1.000
	Research	0.235	0.470	0.295	0.987	0.997
	Revision	0.254	0.483	0.263	0.969	0.996

Table 2.6: OVERALL LEVEL CLASSIFICATION ACCURACY FOR INTERIM ASSESSMENT BLOCKS, MATHEMATICS
Grade	Block Name	Prop Assigned Below Standard	Prop Assigned Near Standard	Prop Assigned Above Standard	Prop Correctly Classified Below Standard	Prop Correctly Classified Above Standard
3	Measurement and Data	0.314	0.418	0.268	0.984	1.000
	Operations and Algebraic Thinking	0.326	0.469	0.205	0.979	0.980
	Performance Task	0.000	0.827	0.173	NA	1.000
4	Measurement and Data	0.221	0.505	0.274	0.977	0.974
	Number and Operations - Fractions	0.374	0.377	0.249	0.989	0.996
	Number and Operations in Base Ten	0.334	0.460	0.206	0.979	0.995
	Operations and Algebraic Thinking	0.337	0.447	0.216	0.994	0.981
	Performance Task	0.000	0.931	0.069	NA	1.000
5	Measurement and Data	0.333	0.438	0.229	0.979	0.969
	Number and Operations - Fractions	0.388	0.395	0.217	0.982	0.986
	Number and Operations in Base Ten	0.343	0.445	0.212	0.983	0.962
	Operations and Algebraic Thinking	0.310	0.515	0.175	0.984	0.971
	Performance Task	0.000	0.878	0.122	NA	0.984
6	Expressions and Equations	0.389	0.399	0.212	0.992	0.986
	Performance Task	0.000	0.879	0.121	NA	1.000
	The Number System	0.359	0.436	0.205	0.983	0.985
7	Expressions and Equations	0.369	0.428	0.203	0.989	0.995
	Geometry	0.070	0.679	0.251	0.971	0.984
	Performance Task	0.008	0.832	0.160	0.875	0.988
8	Expressions and Equations I	0.361	0.479	0.160	0.967	0.994
	Geometry	0.258	0.535	0.207	0.984	0.981
	Performance Task	0.349	0.651	0.000	0.911	NA
11	Algebra and Functions I	0.442	0.459	0.099	0.984	1.000
	Algebra and Functions II	0.232	0.572	0.196	1.000	0.969
	Geometry Congruence	0.171	0.677	0.152	0.977	0.954
	Geometry Measurement and Modeling	0.000	0.848	0.152	NA	0.941
	Performance Task	0.000	0.930	0.070	NA	0.986

Table 2.7: OVERALL LEVEL CLASSIFICATION ACCURACY FOR FOCUSED INTERIM ASSESSMENT BLOCKS, ELA/LITERACY
Grade	Block Name	Prop Assigned Below Standard	Prop Assigned Near Standard	Prop Assigned Above Standard	Prop Correctly Clasified Below Standard	Prop Correctly Classified Above Standard
3	Editing	0.238	0.527	0.235	0.979	0.979
	Language and Vocabulary Use	0.241	0.516	0.243	0.979	1.000
	Listen/Interpret	0.226	0.521	0.253	0.973	0.992
	Research: Analyze Information	0.180	0.640	0.180	0.967	0.994
	Research: Interpret and Integrate Information	0.215	0.613	0.172	0.991	0.988
	Write and Revise Narratives	0.244	0.607	0.149	0.971	0.993
4	Editing	0.243	0.549	0.208	0.951	0.995
	Language and Vocabulary Use	0.234	0.517	0.249	0.987	1.000
	Listen/Interpret	0.224	0.579	0.197	0.964	0.985
	Research: Analyze Information	0.207	0.627	0.166	0.976	0.988
	Research: Interpret and Integrate Information	0.229	0.565	0.206	0.978	0.976
	Write and Revise Narratives	0.249	0.667	0.084	0.956	1.000
5	Editing	0.220	0.519	0.261	0.968	0.989
	Language and Vocabulary Use	0.247	0.526	0.227	0.972	0.996
	Listen/Interpret	0.233	0.530	0.237	0.979	0.987
	Research: Analyze Information	0.196	0.640	0.164	0.980	0.982
	Research: Interpret and Integrate Information	0.208	0.545	0.247	0.981	0.996
	Write and Revise Narratives	0.245	0.563	0.192	0.951	1.000
6	Editing	0.227	0.554	0.219	0.978	0.968
	Language and Vocabulary Use	0.265	0.513	0.222	0.974	0.977
	Listen/Interpret	0.263	0.502	0.235	0.989	0.991
	Research Analyze and Integrate Information	0.009	0.763	0.228	1.000	0.991
	Research: Evaluate Information and Sources	0.224	0.550	0.226	0.982	0.978
	Write and Revise Narratives	0.252	0.583	0.165	0.960	0.994
7	Editing	0.109	0.724	0.167	0.982	0.958
	Language and Vocabulary Use	0.250	0.506	0.244	0.976	0.975
	Listen/Interpret	0.286	0.499	0.215	0.969	0.972
	Research Analyze and Integrate Information	0.201	0.574	0.225	0.985	0.973
	Research: Evaluate Information and Sources	0.252	0.579	0.169	0.980	0.988
	Write and Revise Narratives	0.267	0.608	0.125	0.944	0.992
8	Listen/Interpret	0.288	0.521	0.191	0.976	0.963
	Research Analyze and Integrate Information	0.254	0.565	0.181	0.996	0.978
	Research: Evaluate Information and Sources	0.305	0.560	0.135	0.980	1.000
	Write and Revise Narratives	0.303	0.582	0.115	0.947	1.000
11	Editing	0.217	0.520	0.263	0.959	1.000
	Language and Vocabulary Use	0.282	0.464	0.254	0.986	0.996
	Listen/Interpret	0.258	0.526	0.216	0.981	0.991
	Research Analyze and Integrate Information	0.182	0.597	0.221	0.989	0.995
	Research: Evaluate Information and Sources	0.189	0.571	0.240	0.968	1.000
	Write and Revise Narratives	0.282	0.550	0.168	0.936	1.000

Table 2.8: OVERALL LEVEL CLASSIFICATION ACCURACY FOR FOCUSED INTERIM ASSESSMENT BLOCKS, MATHEMATICS
Grade	Block Name	Prop Assigned Below Standard	Prop Assigned Near Standard	Prop Assigned Above Standard	Prop Correctly Classified Below Standard	Prop Correctly Classified Above Standard
3	Geometry	0.223	0.649	0.128	0.973	0.961
	Multiplication and Division Interpret Represent and Solve	0.277	0.511	0.212	0.986	1.000
	Multiply and Divide within 100	0.312	0.472	0.216	0.978	0.995
	Number and Operations - Fractions	0.245	0.500	0.255	0.984	0.992
	Number and Operations in Base Ten	0.266	0.429	0.305	1.000	0.993
	Properties of Multiplication and Division	0.228	0.563	0.209	0.978	0.981
4	our Operations: Interpret, Represent, and Solve	0.339	0.436	0.225	0.991	0.996
	Fraction Equivalence and Ordering	0.285	0.476	0.239	0.986	0.996
	Fractions and Decimal Notation	0.264	0.495	0.241	0.985	0.975
	Geometry	0.000	0.802	0.198	NA	0.955
5	Add and Subtract with Equivalent Fractions	0.382	0.419	0.199	0.987	0.990
	Geometry	0.280	0.580	0.140	0.968	0.929
	Numerical Expressions	0.315	0.489	0.196	0.975	0.974
	Operations with Whole Numbers and Decimals	0.309	0.513	0.178	0.990	0.989
6	Dependent and Independent Variables	0.381	0.458	0.161	0.987	0.994
	Divide Fractions by Fractions	0.393	0.413	0.194	0.990	0.990
	Geometry	0.227	0.560	0.213	0.974	0.995
	One-Variable Expressions and Equations	0.459	0.440	0.101	0.952	1.000
	Ratios and Proportional Relationships	0.381	0.458	0.161	0.992	0.975
	Statistics and Probability	0.217	0.636	0.147	0.972	0.952
7	Algebraic Expressions and Equations	0.301	0.482	0.217	0.990	0.991
	Equivalent Expressions	0.238	0.567	0.195	0.992	0.990
	Geometric Figures	0.235	0.521	0.244	0.983	0.988
	Ratios and Proportional Relationships	0.258	0.495	0.247	0.977	0.980
	Statistics and Probability	0.280	0.483	0.237	0.989	0.975
	The Number System	0.288	0.508	0.204	0.976	0.990
8	Analyze and Solve Linear Equations	0.233	0.551	0.216	0.991	0.991
	Congruence and Similarity	0.266	0.547	0.187	0.985	0.968
	Expressions and Equations II	0.388	0.503	0.109	0.951	0.991
	Functions	0.343	0.421	0.236	0.985	0.987
	Proportional Relationships, Lines, and Linear Equations	0.197	0.574	0.229	0.990	0.983
	The Number System	0.249	0.551	0.200	0.976	0.965
11	Equations and Reasoning	0.325	0.489	0.186	0.994	0.995
	Geometry and Right Triangle Trigonometry	0.282	0.484	0.234	0.986	0.974
	Interpreting Functions	0.308	0.475	0.217	0.994	0.959
	Number and Quantity	0.312	0.507	0.181	0.990	0.961
	Seeing Structure in Expressions/Polynomial Expressions	0.387	0.432	0.181	0.990	0.983
	Solve Equations and Inequalities: Linear and Exponential	0.329	0.506	0.165	0.988	0.970
	Solve Equations and Inequalities: Quadratic	0.000	0.781	0.219	NA	0.982
	Statistics and Probability	0.287	0.605	0.108	0.983	0.898

Table 2.9 and Table 2.10 show the accuracy of the ICAs for classifying students into achievement levels (L1 to L4) based on students’ overall test performance. Each table shows the proportion of simulees assigned to each achievement level as well as the proportion who were correctly classified. For example, a proportion of 0.28, or 28%, of the simulated student cases for the grade 3 ELA/literacy ICA were assigned to achievement level 1 (L1). Of these, a proportion of 0.79, or 79%, were truly at achievement level 1 based on the values of the thetas used for them in the simulation. Simulated classifications tended to be more accurate for levels 1 and 4 than for levels 2 and 3.

Table 2.9: OVERALL LEVEL CLASSIFICATION ACCURACY FOR INTERIM COMPREHENSIVE ASSESSMENTS, ELA/LITERACY
Grade	Prop Assigned L1	Prop Assigned L2	Prop Assigned L3	Prop Assigned L4	Prop Correctly Classified L1	Prop Correctly Classified L2	Prop Correctly Classified L3	Prop Correctly Classified L4	Overall	kappa
3	0.28	0.28	0.24	0.20	0.79	0.62	0.60	0.94	0.73	0.89
4	0.32	0.24	0.24	0.20	0.83	0.59	0.62	0.95	0.74	0.90
5	0.33	0.22	0.30	0.15	0.81	0.60	0.72	0.91	0.75	0.90
6	0.35	0.29	0.28	0.07	0.76	0.54	0.58	0.99	0.66	0.84
7	0.40	0.27	0.27	0.06	0.74	0.43	0.54	0.97	0.62	0.82
8	0.34	0.27	0.30	0.08	0.83	0.59	0.65	0.99	0.72	0.87
9	0.39	0.13	0.33	0.15	0.83	0.40	0.63	0.96	0.72	0.89
10	0.29	0.26	0.33	0.13	0.78	0.60	0.62	0.98	0.70	0.87
11	0.30	0.27	0.31	0.12	0.78	0.59	0.61	0.97	0.70	0.87

Table 2.10: OVERALL LEVEL CLASSIFICATION ACCURACY FOR INTERIM COMPREHENSIVE ASSESSMENTS, MATHEMATICS
Grade	Prop Assigned L1	Prop Assigned L2	Prop Assigned L3	Prop Assigned L4	Prop Correctly Classified L1	Prop Correctly Classified L2	Prop Correctly Classified L3	Prop Correctly Classified L4	Overall	kappa
3	0.31	0.25	0.30	0.14	0.88	0.63	0.64	0.96	0.76	0.90
4	0.29	0.28	0.26	0.17	0.84	0.76	0.66	0.90	0.78	0.91
5	0.35	0.28	0.20	0.17	0.87	0.71	0.54	0.92	0.77	0.90
6	0.36	0.28	0.20	0.17	0.87	0.77	0.63	0.94	0.81	0.92
7	0.33	0.26	0.22	0.19	0.88	0.73	0.67	0.90	0.80	0.92
8	0.39	0.27	0.20	0.14	0.82	0.64	0.54	0.96	0.73	0.89
9	0.34	0.25	0.24	0.18	0.84	0.63	0.57	0.95	0.74	0.90
10	0.38	0.26	0.24	0.13	0.86	0.66	0.62	0.96	0.76	0.90
11	0.41	0.28	0.22	0.08	0.85	0.64	0.60	0.96	0.74	0.88

Table 2.11 and Table 2.12 show the classification accuracy for ICA claim scores. These tables show the proportion of simulees assigned to each category of achievement with respect to the level 3 cut score (standard)—below, near, or above—and for each of the “above” and “below” categories, the proportion of those assigned to that category whose thetas were truly above or below the standard. Simulated accuracy was moderately high or high.

Table 2.11: CLAIM LEVEL CLASSIFICATION ACCURACY FOR INTERIM COMPREHENSIVE ASSESSMENTS, ELA/LITERACY
Claim	Grade	Prop Assigned Below	Prop Assigned Near	Prop Assigned Above	Prop Correctly Classified Below	Prop Correctly Classified Above
1	3	0.404	0.498	0.098	0.886	1.000
	4	0.227	0.580	0.193	0.960	0.964
	5	0.256	0.532	0.212	0.969	0.976
	6	0.277	0.515	0.208	0.996	0.976
	7	0.232	0.598	0.170	0.944	0.965
	8	0.356	0.448	0.196	0.966	0.985
	9	0.242	0.571	0.187	0.959	0.989
	10	0.251	0.588	0.161	0.968	0.994
	11	0.266	0.604	0.130	0.951	1.000
2	3	0.212	0.589	0.199	0.972	0.995
	4	0.328	0.634	0.038	0.905	1.000
	5	0.383	0.581	0.036	0.890	1.000
	6	0.542	0.453	0.005	0.839	1.000
	7	0.576	0.411	0.013	0.825	1.000
	8	0.525	0.458	0.017	0.853	1.000
	9	0.350	0.583	0.067	0.923	1.000
	10	0.372	0.614	0.014	0.927	1.000
	11	0.392	0.594	0.014	0.923	1.000
3	3	0.271	0.475	0.254	0.993	0.988
	4	0.305	0.424	0.271	0.987	0.996
	5	0.285	0.414	0.301	0.989	0.993
	6	0.207	0.626	0.167	0.986	0.988
	7	0.367	0.431	0.202	0.978	0.995
	8	0.368	0.343	0.289	0.992	0.990
	9	0.295	0.387	0.318	0.986	0.997
	10	0.313	0.383	0.304	0.997	0.997
	11	0.327	0.385	0.288	0.997	1.000
4	3	0.157	0.651	0.192	0.962	0.979
	4	0.179	0.648	0.173	0.966	0.965
	5	0.205	0.600	0.195	0.980	0.969
	6	0.336	0.472	0.192	0.982	0.995
	7	0.219	0.618	0.163	0.968	0.969
	8	0.253	0.747	0.000	0.960	NA
	9	0.216	0.610	0.174	0.977	0.983
	10	0.222	0.608	0.170	0.982	0.982
	11	0.232	0.602	0.166	0.974	0.982

Table 2.12: CLAIM LEVEL CLASSIFICATION ACCURACY FOR INTERIM COMPREHENSIVE ASSESSMENTS, MATHEMATICS
Claim	Grade	Prop Assigned Below	Prop Assigned Near	Prop Assigned Above	Prop Correctly Classified Below	Prop Correctly Classified Above
1	3	0.379	0.357	0.264	0.984	0.992
	4	0.412	0.308	0.280	0.995	0.989
	5	0.421	0.354	0.225	0.986	0.982
	6	0.450	0.307	0.243	0.996	0.971
	7	0.404	0.337	0.259	0.995	0.985
	8	0.459	0.349	0.192	0.985	1.000
	9	0.379	0.338	0.283	0.987	0.989
	10	0.416	0.336	0.248	0.993	0.996
	11	0.426	0.378	0.196	0.995	0.974
2&4	3	0.205	0.705	0.090	0.932	1.000
	4	0.316	0.551	0.133	0.981	1.000
	5	0.477	0.458	0.065	0.935	0.985
	6	0.188	0.664	0.148	0.957	0.993
	7	0.285	0.474	0.241	0.986	0.979
	8	0.274	0.658	0.068	0.978	0.985
	9	0.174	0.696	0.130	0.971	1.000
	10	0.225	0.681	0.094	0.987	1.000
	11	0.245	0.718	0.037	0.976	1.000
3	3	0.000	0.707	0.293	NA	0.983
	4	0.000	0.821	0.179	NA	0.978
	5	0.127	0.729	0.144	0.984	0.986
	6	0.092	0.722	0.186	0.989	0.989
	7	0.286	0.714	0.000	0.979	NA
	8	0.453	0.444	0.103	0.960	0.990
	9	0.326	0.591	0.083	0.933	1.000
	10	0.347	0.581	0.072	0.939	1.000
	11	0.467	0.474	0.059	0.949	1.000