Chapter 8 Change in Test Scores from Previous Year | 2018-19 Summative Technical Report

8.1 Introduction

This chapter reports only the differences between the 2017–18 and 2018-19 summative test administrations and results. The term “change” is used to describe the differences between the two administrations. The study of differences is confined here to a two-year time frame in order to include as many states and as much data as possible without confounding change within a fixed set of states with changes in membership or member participation. Adding time points generally reduces the number of states and data that can be included if all time points are to represent the same members and level of participation.

Readers may find it possible to discern longer-range trends in student performance and other aspects of the Smarter Balanced assessment by studying separate but consecutive trend reports where each is based on two- or three-year time points. A trend report issued in 2018 included three time points: the 2015–16, 2016–17, and 2017–18 assessments. The present report and technical reports in the future will include results for two consecutive annual summative assessments.

States included in some or all of the analyses performed for this chapter are shown in Table 8.1. To be included in analyses for a given grade, a state had to administer the test to students in that grade in both administrations. Some analyses, such as for test duration, had stricter requirements, which are described in the sections containing results of those analyses. Besides having to participate in both administrations for a given grade, member jurisdictions are only included if they provided their student data to Smarter Balanced both years and student performance was reported on, or could be transformed to, the Smarter Balanced reporting scale.

To protect confidentiality, results are never reported for a single state. Therefore, results that include Idaho are reported at the high school level, which will also include grade 11 students from the additional states indicated in Table 8.1.

Table 8.1: STATES INCLUDED IN ANALYSES OF STUDENT DATA
Grade	ELA CAT	ELA PT	Math CAT	Math PT
3 to 7	CA,DE,HI,ID,OR,SD,VI,VT,WA	CA,DE,HI,ID,OR,SD,VI,VT,WA	CA,DE,HI,ID,OR,SD,VI,VT,WA	CA,DE,ID,OR,SD,VI,VT,WA
8	CA,DE,HI,ID,OR,SD,VT,WA	CA,DE,HI,ID,OR,SD,VT,WA	CA,DE,HI,ID,OR,SD,VT,WA	CA,DE,ID,OR,SD,VT,WA
11	CA,HI,ID,OR,SD,VI,WA	CA,HI,ID,OR,SD,VI,WA	CA,HI,ID,OR,SD,VI,WA	CA,ID,OR,SD,VI,WA

8.2 Change in Student Performance

Table 8.2 shows student mean scale scores and standard deviations for overall performance by grade for the 2017–18 and 2018-19 administrations and their difference. Mean ELA/literacy scale scores increased for all grades. Mean mathematics scale scores increased at lower grades and decreased slightly at grades 8 and high school. Effect sizes were generally quite small. Effect sizes are the ratio of the change to the standard deviation of the scale score in the previous year (2017–18). The standard deviation of scale scores tends to increase with grade in both subjects.

Table 8.3 shows the change in percent proficient. Percent proficient is the percent of students at or above the level 3 cut score. Patterns of change in percent proficient are similar to those for change in student scale scores. The percent proficient increased at lower grades in both subjects and increased only slightly or decreased in grades 8 and high school.

Table 8.2: CHANGE IN STUDENT SCALE SCORES
Subject	Grade	N 2018	Mean 2018	SD 2018	N 2019	Mean 2019	SD 2019	Change	Effect Size
ELA/Lit.	3	760,057	2425	90.7	767,867	2426	91.4	1	1e-06
	4	791,917	2466	97.0	764,422	2468	97.2	2	2e-06
	5	801,703	2500	98.5	796,299	2504	98.6	4	5e-06
	6	806,524	2521	98.2	801,488	2522	97.5	2	2e-06
	7	789,642	2546	102.0	808,058	2549	103.5	3	4e-06
	8	675,251	2564	103.0	683,061	2564	104.8	0	1e-06
	HS	619,755	2598	117.0	621,955	2600	117.8	3	4e-06
Math	3	762,712	2433	84.8	770,706	2435	85.0	2	3e-06
	4	794,458	2471	86.4	767,104	2474	86.7	3	4e-06
	5	803,725	2495	94.7	798,483	2498	95.8	3	4e-06
	6	808,325	2514	108.6	803,387	2515	109.3	1	1e-06
	7	791,064	2528	115.2	809,585	2530	115.9	2	2e-06
	8	675,393	2546	126.2	683,709	2544	127.2	-2	-3e-06
	HS	647,424	2565	127.4	637,170	2564	129.8	-1	-1e-06

Table 8.3: CHANGE IN PERCENT PROFICIENT
Subject	Grade	Prof Pct 2018	Prof Pct 2019	Change
ELA/Lit.	3	48.6	48.9	0.2
	4	49.5	50.0	0.6
	5	51.0	52.5	1.5
	6	48.3	48.9	0.6
	7	50.8	51.7	0.9
	8	51.3	51.0	-0.2
	HS	58.5	59.2	0.7
Math	3	49.7	50.6	0.9
	4	44.4	45.8	1.4
	5	37.9	39.2	1.3
	6	38.4	38.9	0.5
	7	38.7	39.1	0.4
	8	38.5	37.9	-0.6
	HS	32.9	32.4	-0.5

8.3 Change in Student Demographics

Student demographics by year, and year-to-year change in demographics, are shown in Table 8.4 and Table 8.5. The years in the column headings of this table represent the spring year of the administration. The heading “2018” represents the 2017–18 administration.

The numbers in Table 8.4 and Table 8.5 are the weighted averages across grade and members. Differences among grades in how much and in what direction change occurred will not be evident in this table. Where change appears to be zero for a given demographic group, it is possible that one or more grades will show significant change for the group. It is also possible that different grades may show change in different directions. Despite these possibilities, the numbers in Table 8.4 and Table 8.5 generally hold up across grades.

Readers of this report will most likely be interested primarily in results for a given state, in addition to overall and even grade-specific results. Table 8.4 and Table 8.5 are intended primarily to suggest directions for more specific investigations based on state-specific data.

Overall, Table 8.4 and Table 8.5 show that there have been no substantial changes in student demographics that would account for any substantial changes in overall student performance. However, as noted with the examples given above, Table 8.4 and Table 8.5 do indicate that there may be changes in the achievement of specific demographic groups worth investigating at the state level using state-specific and grade-specific data.

Table 8.4: CHANGE IN ELA/LITERACY STUDENT DEMOGRAPHICS
Group	Total Pct 2018	Total Pct 2019	MeanSS 2018	MeanSS 2019	Prof Pct 2018	Prof Pct 2019	Change Total Pct	Change MeanSS	Change Prof Pct
Total	100.0	100.0	2514	2516	50.9	51.6	0.0	2.1	0.7
Female	48.9	48.9	2528	2529	56.1	56.5	0.0	1.2	0.4
Male	51.1	51.1	2501	2504	45.9	46.8	0.0	2.9	0.9
American Indian or Alaska Native	0.9	0.9	2469	2470	31.5	31.7	0.0	0.0	0.2
Asian	8.0	7.9	2577	2579	74.9	75.3	0.0	2.3	0.4
Black/African American	6.7	6.7	2459	2460	29.6	30.2	0.0	1.9	0.6
Native Hawaiian or Pacific Islander	1.0	1.0	2489	2488	38.8	38.1	0.0	-1.8	-0.7
Hispanic/Latino Ethnicity	39.3	39.5	2488	2492	39.2	40.6	0.1	3.7	1.4
White	36.1	35.7	2537	2539	61.8	62.0	-0.4	1.2	0.2
Two or More Races	4.4	4.6	2534	2536	60.4	61.3	0.2	2.8	1.0
Unidentified Race	3.8	4.1	2520	2518	52.9	51.5	0.3	-3.8	-1.4
LEP Status	15.2	14.4	2415	2418	14.0	14.0	-0.8	2.2	0.0
IDEA Indicator	11.4	11.8	2419	2423	15.6	16.5	0.4	3.4	0.8
Section 504 Status	2.0	2.0	2527	2537	50.1	53.8	0.0	9.1	3.6
Economic Disadvantage Status	55.6	56.0	2482	2485	37.7	38.7	0.4	2.6	1.0

Table 8.5: CHANGE IN MATHEMATICS STUDENT DEMOGRAPHICS
Group	Total Pct 2018	Total Pct 2019	MeanSS 2018	MeanSS 2019	Prof Pct 2018	Prof Pct 2019	Change Total Pct	Change MeanSS	Change Prof Pct
Total	100.0	100.0	2505	2507	40.2	40.8	0.0	1.2	0.5
Female	48.9	48.9	2507	2508	39.9	40.2	0.0	0.7	0.4
Male	51.1	51.1	2504	2506	40.6	41.3	0.0	1.8	0.7
American Indian or Alaska Native	0.9	0.9	2458	2458	21.7	21.6	0.0	-0.9	-0.1
Asian	8.0	8.0	2590	2593	71.3	72.0	0.0	3.3	0.7
Black/African American	6.7	6.7	2442	2444	18.8	19.2	0.0	0.9	0.5
Native Hawaiian or Pacific Islander	1.1	1.0	2477	2475	27.8	27.4	0.0	-1.9	-0.4
Hispanic/Latino Ethnicity	39.3	39.5	2474	2477	27.1	28.3	0.2	2.8	1.2
White	36.2	35.7	2531	2531	51.4	51.4	-0.5	0.2	0.1
Two or More Races	4.4	4.5	2525	2527	49.2	50.2	0.1	2.6	1.0
Unidentified Race	3.8	4.1	2506	2503	40.2	39.0	0.3	-4.0	-1.1
LEP Status	15.4	14.7	2419	2420	13.8	13.8	-0.7	0.7	0.0
IDEA Indicator	11.3	11.7	2407	2409	12.3	12.7	0.4	2.1	0.4
Section 504 Status	2.0	2.0	2523	2524	39.7	39.7	0.1	1.7	0.0
Economic Disadvantage Status	55.6	56.0	2471	2473	26.9	27.7	0.4	1.6	0.8

8.4 Change in Testing Times

Table 8.6 and Table 8.7 show changes in test start dates and test duration from 2017–2018 to 2018-19. All of the member jurisdictions listed in Table 8.1 were included in Table 8.6. Table 8.7 excluded member jurisdictions HI and OR because they did not use the same blueprint for both administrations.

Table 8.6 shows that the test was started slightly later, on average, in 2018-19 compared to 2017–18 at all grades except high school. The changes were generally small, however—less than a day for grades 3 to 8. Such changes may have been due to year-to-year differences in school calendars or school cancellations due to weather or other such differences.

Table 8.7 shows that on average, students spent slightly more time taking the mathematics test and about the same amount of time taking the ELA/literacy test in 2018–19 compared to the previous administration. There was minor grade-to-grade variation in these changes.

Table 8.6: CHANGE IN TEST START DAY*
Subject	Grade	Mean 2018	Min 2018	Max 2018	Mean 2019	Min 2019	Max 2019	DIFbwMeans
ELA/Literacy	3	119.7	38	186	120.2	52	189	0.5
	4	119.7	38	191	120.2	51	192	0.4
	5	119.7	38	183	120.0	50	196	0.3
	6	119.6	38	184	120.0	50	191	0.3
	7	118.7	38	184	119.0	51	182	0.3
	8	118.6	38	187	118.8	26	190	0.2
	HS	111.8	-32	194	108.2	-34	180	-3.6
Mathematcs	3	127.8	38	191	128.3	51	187	0.4
	4	128.1	38	192	128.6	52	192	0.5
	5	128.1	38	186	128.4	59	196	0.4
	6	127.0	38	185	127.5	50	182	0.5
	7	125.4	38	184	125.9	59	183	0.5
	8	125.4	38	187	125.8	26	192	0.4
	HS	118.7	-32	180	115.4	-34	180	-3.4
^* Day = number of days before (-) or after December 31, 2018.

Table 8.7: CHANGE IN TEST DURATION (IN MINUTES)
Subject	Group	CAT 2018	PT 2018	Total 2018	CAT 2019	PT 2019	Total 2019	Change CAT	Change PT	Change Total
English Language Arts/Literacy	3	101.4	124.2	225.0	99.0	121.2	220.2	-2.4	-3.0	-4.8
	4	106.2	132.6	238.8	103.2	131.4	234.6	-3.0	-1.2	-4.2
	5	106.8	129.0	235.8	103.2	129.6	232.8	-3.6	0.6	-3.0
	6	110.4	130.2	241.2	118.2	125.4	243.6	7.8	-4.8	2.4
	7	102.0	114.6	216.6	102.0	117.6	219.6	0.0	3.0	3.0
	8	98.4	114.0	212.4	103.2	112.8	216.0	4.8	-1.2	3.6
	11	84.6	82.8	167.4	84.6	84.6	169.2	0.0	1.8	1.8
	Average	NA	NA	NA	NA	NA	NA	0.5	-0.7	-0.2
Mathematics	3	85.8	45.6	131.4	87.0	46.8	133.8	1.2	1.2	2.4
	4	88.8	45.0	133.2	94.2	47.4	141.0	5.4	2.4	7.8
	5	91.2	67.8	158.4	97.8	67.2	165.0	6.6	-0.6	6.6
	6	103.2	58.2	161.4	106.2	59.4	166.2	3.0	1.2	4.8
	7	92.4	34.2	126.6	94.2	36.0	130.2	1.8	1.8	3.6
	8	102.6	42.6	145.2	103.8	42.0	146.4	1.2	-0.6	1.2
	11	75.6	30.6	106.2	76.2	34.8	111.0	0.6	4.2	4.8
	Average	NA	NA	NA	NA	NA	NA	2.8	1.4	4.5

8.5 Change in the Item Pool

There are two very important reasons why one would not expect year-to-year changes in the item pool to cause changes in student achievement in Smarter Balanced assessments. The first is that Smarter Balanced equates student measures across years through the use of industry-standard scale construction and linking methods. These methods rely on Item Response Theory models in which student achievement can be measured independently of item difficulty. In Item Response Theory, a hard test and an easy test should produce the same measure for the same student.

The second reason is that with computer adaptive testing, tests delivered to a student from two different item pools will have practically the same level of difficulty, and therefore the same measurement precision. Therefore, one would not expect the average scale score or achievement level percentages for a population of students to vary with modest changes in the difficulty of the item pool.

In sum, Smarter Balanced test users can be confident that year-to-year changes in the item pool’s difficulty will not substantially change student achievement. This confidence is linked to the methods Smarter Balanced uses to construct and maintain the measurement scale (Item Response Theory) and to select items for students (CAT).

Nevertheless, year-to-year changes in the item pool may be of interest. The general public may not be as willing to accept the arguments given above for why one would not expect changes in the item pool to cause substantial changes in measures of student achievement. Therefore, it may be comforting to policymakers and others to see evidence that the item pool is not changing drastically from year to year.

Table 8.8 and Table 8.9 show changes in the CAT and PT item pools from 2017–18 to 2018-19. Compared to the previous year (2017–18), there were fewer performance task (PT) items in ELA/literacy, but more in mathematics. In both subjects, there were slightly fewer CAT items. In both subjects and all grades, change in the average difficulty (IRTb) of the CAT item pool was extremely small. The difficulty of the PT item pools decreased substantially in most grades within both subjects. This change was intentional in response to the fact that PT pools have historically been somewhat too difficult relative to examinees’ achievement. Changes in the average item discrimination parameter (IRTa) were generally insubstantial, although the PT pools showed more change than the CAT pools, as one might expect from the fact that the PT pools showed larger changes in average item difficulty as well.

Table 8.10 and Table 8.11 show the overlap of the 2017–18 and 2018-19 item pools. With only one exception involving a single item, there were no new CAT items in the 2018-19 pool. All the CAT items in the 2018–19 pool were also in the 2017–18 pool. Comparing Table 8.10 and Table 8.11, it can be seen that the PT pools generally changed in two ways: 1) items from the 2018 pool were dropped, and 2) new items were added. The newly added items were generally easier than the average difficulty of the corresponding 2018 PT pool and, though not shown, easier than the dropped items. These contrasts explain the general tendency of PT pools to be easier in 2019 than they were in 2018.

Table 8.8: CHANGE IN THE CAT ITEM POOL
Subject	Group	CATN 2018	CATIRTa 2018	CATIRTb 2018	CATN 2019	CATIRTa 2019	CATIRTb 2019	DIFCAT N	DIFCAT IRTa	DIFCAT IRTb
ELA	3	940	0.664	-0.434	867	0.662	-0.526	-73	-0.002	-0.092
	4	898	0.593	0.123	823	0.586	0.044	-75	-0.007	-0.079
	5	881	0.602	0.502	787	0.598	0.409	-94	-0.004	-0.093
	6	819	0.555	0.969	811	0.556	0.957	-8	0.001	-0.012
	7	744	0.536	1.250	735	0.535	1.252	-9	-0.001	0.002
	8	819	0.537	1.278	815	0.537	1.286	-4	0.000	0.008
	11	2,631	0.491	1.771	2,612	0.491	1.764	-19	0.000	-0.007
MATH	3	1,269	0.827	-0.771	1,234	0.827	-0.790	-35	0.000	-0.019
	4	1,341	0.816	-0.089	1,325	0.818	-0.094	-16	0.002	-0.005
	5	1,291	0.759	0.573	1,268	0.759	0.565	-23	0.000	-0.008
	6	1,187	0.689	1.153	1,147	0.690	1.140	-40	0.001	-0.013
	7	1,085	0.717	1.877	1,047	0.716	1.871	-38	-0.001	-0.006
	8	972	0.581	2.315	915	0.574	2.284	-57	-0.007	-0.031
	11	2,736	0.581	2.585	2,610	0.577	2.568	-126	-0.004	-0.017

Table 8.9: CHANGE IN THE PT ITEM POOL
Subject	Group	PTN 2018	PTIRTa 2018	PTIRTb 2018	PTN 2019	PTIRTa 2019	PTIRTb 2019	DIFPT N	DIFPT IRTa	DIFPT IRTb
ELA	3	48	0.702	0.394	38	0.668	-0.022	-10	-0.034	-0.416
	4	63	0.635	0.586	44	0.624	0.160	-19	-0.011	-0.426
	5	73	0.690	0.937	50	0.676	0.455	-23	-0.014	-0.482
	6	47	0.839	1.131	38	0.796	0.724	-9	-0.043	-0.407
	7	60	0.772	1.349	48	0.800	1.007	-12	0.028	-0.342
	8	68	0.694	1.511	50	0.689	1.163	-18	-0.005	-0.348
	11	80	0.593	1.972	56	0.597	1.518	-24	0.004	-0.454
MATH	3	80	0.890	-0.521	95	0.887	-0.660	15	-0.003	-0.139
	4	94	0.856	-0.058	116	0.849	-0.128	22	-0.007	-0.070
	5	85	0.758	1.012	105	0.744	0.784	20	-0.014	-0.228
	6	71	0.734	0.789	92	0.705	0.779	21	-0.029	-0.010
	7	85	0.893	1.563	92	0.843	1.444	7	-0.050	-0.119
	8	58	0.878	1.809	79	0.780	1.567	21	-0.098	-0.242
	11	61	0.662	2.674	70	0.629	2.662	9	-0.033	-0.012

Table 8.10: OVERLAP OF CAT ITEM POOLS
Subject	Grade	CATN Common	CATIRTa Common	CATIRTb Common	CATN New	CATIRTa New	CATIRTb New
ELA	3	867	0.662	-0.526	NA	NA	NA
	4	823	0.586	0.044	NA	NA	NA
	5	787	0.598	0.409	NA	NA	NA
	6	811	0.556	0.957	NA	NA	NA
	7	735	0.535	1.252	NA	NA	NA
	8	815	0.537	1.286	NA	NA	NA
	11	2,612	0.491	1.764	NA	NA	NA
MATH	3	1,234	0.827	-0.790	NA	NA	NA
	4	1,324	0.817	-0.094	1	0.934	0.227
	5	1,268	0.759	0.565	NA	NA	NA
	6	1,147	0.690	1.140	NA	NA	NA
	7	1,047	0.716	1.871	NA	NA	NA
	8	915	0.574	2.284	NA	NA	NA
	11	2,610	0.577	2.568	NA	NA	NA

Table 8.11: OVERLAP OF PT ITEM POOLS
Subject	Grade	PTN Common	PTIRTa Common	PTIRTb Common	PTN New	PTIRTa New	PTIRTb New
ELA	3	28	0.728	-0.011	10	0.500	-0.054
	4	36	0.651	0.158	8	0.502	0.172
	5	40	0.733	0.409	10	0.449	0.640
	6	28	0.904	0.714	10	0.494	0.751
	7	38	0.852	0.981	10	0.605	1.107
	8	40	0.757	1.013	10	0.414	1.759
	11	46	0.604	1.579	10	0.564	1.237
MATH	3	75	0.887	-0.546	20	0.883	-1.089
	4	94	0.856	-0.058	22	0.818	-0.427
	5	85	0.758	1.012	20	0.686	-0.188
	6	70	0.733	0.802	22	0.615	0.704
	7	77	0.879	1.519	15	0.658	1.057
	8	57	0.874	1.819	22	0.536	0.914
	11	54	0.641	2.773	16	0.592	2.288