Introduction and Overview

Technical Report Approach

This report intends to provide comprehensive and detailed evidence in support of the validity and reliability of the Smarter Balanced assessment program. This report focuses on the summative assessment, which consists of a performance component and a computer adaptive component. Information about the overall system is included as well to provide context. At the outset, it should be recognized that demonstration of validity and reliability is an ongoing process. Validity and reliability evidence are provided here from the initial pilot and the field test phases, along with evidence from more recent operational assessments.

Because the Consortium is comprised of members who contract separately for test delivery and scoring, and have varied practices for test administration, some evidence of validity comes from individual members, not from the Consortium. This will be noted throughout this report. In some cases (e.g., the Online Test Administration Manual), the Consortium provides a customizable template or a guidance document that allows members to document their test administration practices.

To inform the Consortium, the Standards for Educational and Psychological Testing (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014), hereafter referred to as the Standards, was used as the foundation for developing validity and reliability evidence considered necessary and sufficient. Also referenced is the U.S. Department of Education (U.S. DOE) Peer Review of State Assessment Systems Non-Regulatory Guidance for States for Meeting Requirements of the Elementary and Secondary Education Act of 1965 (2015). It stipulates the requirements for assessment programs to receive federal approval under current ESEA (Elementary and Secondary Education Act) legislation. With respect to Smarter Balanced, this information is necessary for understanding the degree to which the Consortium is meeting its goals, and in some cases, what further reliability and validity evidence is needed to support the system as it evolves and improves operationally.

Peer Review Guidelines and Established Standards

Among the principles underlying the Smarter Balanced Theory of Action is adherence “to established professional standards” (Smarter Balanced, 2010, p. 33). In addition to adhering to the AERA et al. (2014) Standards, the Consortium will also meet selected requirements of the U.S. DOE peer review process for ESEA assessments. There is a great deal of overlap between the AERA et al. (2014) Standards and the U.S. DOE Peer Review Guidance. However, the Guidance stipulates many important requirements. In particular, to meet these requirements, the validity and reliability evidence and the ongoing research agenda should include

evidence concerning the purpose of an assessment system and studies that support the validity of using results from the assessment system based on their stated purpose and use;
strong correlations of test and item scores, with relevant measures of academic achievement and weak correlations with irrelevant characteristics, such as demographics (i.e., convergent and discriminant validity);
documentation of the definitions for cut scores and the rationale and procedures for establishing them;
evidence concerning the precision of the cut scores and consistency of student classification;
evidence of sufficient levels of reliability for the overall student population and for each targeted student population;
evidence of content alignment over time through quality control reviews;
evidence of comprehensive alignment and measurement of the full range of content standards, depth of knowledge, and cognitive complexity;
evidence that the assessment plan and test specifications describe how all content standards are assessed and how the domain is sampled in a fashion that supports valid inferences about student performance on the standards, both individually and aggregated;
scores that reflect the full range of achievement standards;
documentation that describes a coherent system of assessment across grades and subjects, including studies establishing vertical scales; and
evidence of how assessments provide information on the progress of students.

These requirements for reliability and validity evidence were given consideration in the development of the Smarter Balanced assessment system. The Theory of Action and primary purposes and goals of Smarter Balanced are briefly described below.

Overview and Background of the Smarter Balanced Theory of Action

The Smarter Balanced Assessment Consortium’s Theory of Action calls for increasing the number of students who are well prepared for college and careers through improved teaching and student learning. The Consortium supports improved teaching and student learning through the development of high-quality assessment and reporting systems. These systems lead to targeted and effective professional development and educational decision-making.

A quality assessment system strategically “balances” summative, interim, and formative components (Darling-Hammond & Pecheone, 2010). An assessment system must provide valid measurement across the full range of performance on common academic content, including assessment of deep disciplinary understanding and higher-order thinking skills increasingly demanded by a knowledge-based economy.

Six Principles Underlying the Smarter Balanced Theory of Action

The Smarter Balanced assessment is guided by a set of six principles shared by systems in high-achieving nations and in some high-achieving states in the U.S.

Assessments are grounded in a thoughtful, standards-based curriculum and are managed as part of an integrated system of standards, curriculum, assessment, instruction, and teacher development. Curriculum and assessments are organized around a well-defined set of learning progressions along multiple dimensions within subject areas. Formative assessment processes and tools and interim assessments are conceptualized in tandem with summative assessments, all of which are linked to the Common Core State Standards (CCSS) and supported by a unified technology platform.
Assessments produce evidence of student performance on challenging tasks that represent the CCSS. Instruction and assessments seek to teach and evaluate knowledge and skills that generalize and can transfer to higher education and multiple work domains. These assessments emphasize deep knowledge of core concepts and ideas within and across the disciplines—along with analysis, synthesis, problem-solving, communication, and critical thinking—thereby requiring a focus on complex performances and specific concepts, facts, and skills.
Teachers are integrally involved in the development and scoring of assessments. While many assessment components are efficiently scored with computer assistance, teachers must also be involved in the formative and summative assessment systems. This is in order to understand and teach in a manner that is consistent with the full intent of the standards while becoming more skilled in their own classroom assessment practices.
The development and implementation of the assessment system is a state-led effort with a transparent and inclusive governance structure. Assessments are structured to improve teaching and learning. Assessments are designed to develop an understanding of learning standards, what constitutes high-quality work, to what degree students are approaching college and career readiness, and what is needed for further student learning.
Assessment, reporting, and accountability systems provide useful information on multiple measures that is educative for all stakeholders. Reporting of assessment results is timely and meaningful in order to guide curriculum and professional development decisions. Results can offer specific information about areas of performance so that teachers can follow up with targeted instruction, students can better target their own efforts, and administrators and policymakers can fully understand what students know and can do.
Design and implementation strategies adhere to established professional standards. The development of an integrated, balanced assessment system is an enormous undertaking, requiring commitment to established quality standards in order for the system to be credible, fair, and technically sound. Smarter Balanced continues to be committed to developing an assessment system that meets critical elements required by U.S. DOE Peer Review Guidance, relying heavily on the Standards as its core resource for quality design.

The primary rationale of the Smarter Balanced assessments is that these six principles can interact to improve the intended student outcomes (i.e., college and career readiness).

Purpose of Smarter Balanced Assessment System

The Smarter Balanced purpose statements are organized into three categories: (a) summative assessments, (b) interim assessments, and (c) formative assessment resources. This report provides technical information about the summative assessments. The purposes of interim assessments and formative resources are also stated in this section to provide context for summative assessments as a component of the assessment system.

Summative Assessments

The purposes of the Smarter Balanced summative assessments are to provide valid, reliable, and fair information about

students’ ELA/literacy and mathematics achievement with respect to the CCSS measured by the ELA/literacy and mathematics summative assessments in grades 3 to 8 and high school;
whether students prior to grade 11 have demonstrated sufficient academic proficiency in ELA/literacy and mathematics to be on track for achieving college readiness;
whether grade 11 students have sufficient academic proficiency in ELA/literacy and mathematics to be ready to take credit-bearing, transferable college courses after completing their high school coursework;
students’ annual progress toward college and career readiness in ELA/literacy and mathematics;
how instruction can be improved at the classroom, school, district, and state levels;
students’ ELA/literacy and mathematics proficiencies for federal accountability purposes and potentially for state and local accountability systems; and
student achievement in ELA/literacy and mathematics that is equitable for all students and targeted student groups.

Interim Assessments

The purposes of the Smarter Balanced interim assessments are to provide valid, reliable, and fair information about

student progress toward the mastery of the skills in ELA/literacy and mathematics measured by the summative assessment;
student performance at the claim or cluster of assessment targets so teachers and administrators can better measure students’ performance against end-of-year expectations and adjust instruction accordingly;
individual and group (e.g., school, district) performance at the claim level in ELA/literacy and mathematics to determine whether teaching and learning are on target;
teacher-moderated scoring of student responses to constructed-response items as a professional development vehicle to enhance teacher capacity to evaluate student work aligned to the standards; and
student progress toward the mastery of skills measured in ELA/literacy and mathematics across all students and targeted student groups.

Formative Assessment Resources

The purposes of the Smarter Balanced formative assessment resources are to provide tools and resources to

improve teaching and learning;
help teachers monitor their students’ progress throughout the school year;
illustrate how teachers and other educators can use assessment data to engage students in monitoring their own learning;
help teachers and other educators align instruction, curricula, and assessments to the learning standards and end-of-year expectations;
assist teachers and other educators in using the summative and interim assessments to improve instruction at the individual student, classroom, and school levels; and
offer professional development and resources for how to use assessment information to improve instruction and decision-making in the classroom.

Overview of Report Chapters

The chapters in this technical report follow elements in the 2014 Standards:

Chapter Number	Chapter Title
1	Validity
2	Reliability, Precision, and Errors of Measurement
3	Test Fairness
4	Test Design
5	Scores, Scales, and Norms
6	Test Administration
7	Reporting and Interpretation

In addition to these chapters, this report includes a chapter on trends in test scores, Chapter 8. Brief synopses of these chapters are given below in order to direct further review. At the suggestion of our members, written practical descriptions of the purpose of evidence in each chapter is presented to provide context for teachers, parents, and other stakeholders seeking the information in this report.

Chapter 1: Validity

In a sense, all of the information in this technical report provides validity evidence. Chapter 1 is special in that it provides information about test purposes and the overall approach to showing how scores are appropriate for those purposes. The information in this chapter answers questions such as:

For what purpose was the summative assessment designed to be used?
What evidence shows that test scores are appropriate for these uses?
What are the intended test score interpretations for specific uses?

Evidence bearing on these questions does not change with each administration or testing cycle. Therefore, the validity information presented in Chapter 1 repeats and supplements the validity information in Chapter 1 of previous technical reports.

Chapter 2: Reliability, Precision, and Errors of Measurement

The degree of accuracy and precision of scores contributes to evidence about appropriate interpretations and uses of test scores. Decisions must be made with full understanding of measurement error and reliability. Chapter 2 presents information about how the test performs in terms of measurement precision, reliability, classification consistency, and other technical criteria. The information is based on simulation studies and operational test data from the item pool and school year identified in the title of this report. Information presented in this chapter can answer questions such as:

How accurate and reliable are Smarter Balanced summative test scores?
Are Smarter Balanced test scores equally accurate and reliable for all students?

Chapter 3: Test Fairness

Test fairness concerns whether test scores can be interpreted in the same way for all students regardless of race, gender, special needs, and other characteristics. Evidence for test fairness includes documentation of industry-standard procedures for item development and review, appropriate use of professional judgment (e.g., bias review of items), and statistical procedures for detecting potential bias in test items. Information presented in Chapter 3 can answer questions such as:

How were test questions and tasks developed to ensure fairness to all students?
How is the test administered so that each student can demonstrate their skills?
How does one know that the test is fair to all students?

Chapter 4: Test Design

Chapter 4 is focused on the implementation of design principles and procedures that assure the validity and technical quality of the test. The content validity of the test is supported by the test structure (claims, targets), its relationship to the CCSS, and by appropriate item and task development processes and alignment studies. The test’s validity and technical quality are also supported by the field test design, item analysis and calibration procedures, and item quality control measures. Chapter 4 can answer questions such as:

What’s on the test? Is it consistent with stated test purposes?
Does each student get a set of questions that fully represents the content domain?
Does each student receive a test at an appropriate level of difficulty?
What item quality control measures are applied before and after field testing?
What design principals are used in field testing?

Chapter: 5 Scores, Scales, and Norms

Chapter 5 presents test results for the academic year named in the title of this technical report. To provide a context for these results, information is presented about how the scales used in reporting test results were established in pilot and field test stages and how cut scores that define levels of achievement were established. Test results by grade and subject include means, standard deviations, and achievement level percentages for demographic student groups and in total for members of the Consortium that provided data to Smarter Balanced. Results are presented for the overall scale score and for claim subscores. Chapter 5 can answer questions such as:

How was the reporting scale defined?
How were the achievement levels set?
How well did students perform with respect to the achievement levels?
How did students in one demographic student group perform compared to others?

Chapter 6: Test Administration

Part of test validity rests on the assumption that assessments are administered in a standard manner. Because Smarter Balanced tests are given on such a large scale in different policy and operational contexts, the Consortium provides a common administration template that members customize for specific use. Chapter 6 describes the customizable Smarter Balanced Online Test Administration Manual. The information in Chapter 6 can answer questions such as:

What were the conditions for test administration to assure that every student was afforded the same chance for success?
How was the test administered to allow for accessibility for all students?
Was the test administration secure?
Do test records show that the test was administered as intended?

Chapter 7: Reporting and Interpretation

Reports based on test scores are among the most public-facing features of an assessment program. They must be useful and accurate to support the decisions and purposes for which the assessment was designed while discouraging inappropriate conclusions and comparisons. Chapter 7 provides examples of the Smarter Balanced suite of reports and interpretive information and discusses intended uses of report information. Chapter 7 can answer questions such as:

What information is contained in Smarter Balanced reports?
What do scores mean?
How can the reports best be used by teachers and parents?

Chapter 8: Trends in Test Scores

Large-scale student assessments are generally designed to allow comparisons of test scores over time to assess growth in individual students and groups and to detect overall trends in student achievement. Smarter Balanced supports test score comparability over time through the maintenance of vertical scales constructed with data from the 2013–14 field test. Maintenance of these scales requires new items to be developed and field tested according to specifications provided by Smarter Balanced and to be calibrated to the vertical scales with appropriate Item Response Theory models and methods.

Chapter 8 presents grade-level information about changes in test scores from the previous year to the current year. The assessment of change is confined to two years, using only students in states that delivered the assessment to the same grade in both years. By confining the assessment of change to these students, differences in which states administered the assessment are eliminated as possible reasons for observed trends. Changes in demographics may still account for changes in achievement; however, since demographics for a given grade may change over time within a state. The information in Chapter 8 can answer questions such as:

How did students in grade 5 perform this year compared to last year?
Is the difference between last year’s and this year’s grade 5 students statistically significant?
Are this year’s grade 5 students demographically similar to last year’s?

Acknowledgments

Below is a partial list of committees that contributed time and expertise to the work of the Consortium.

Technical Advisory Committee

The Technical Advisory Committee (TAC) provides guidance on technical assessment matters pertaining to validity and reliability, accuracy, and fairness. Members of the TAC are highly regarded national experts who have been widely published in their fields. Areas of expertise include: assessment design, computer adaptive testing (CAT), assessment accommodations, uses of tests, mathematics, and English language arts/literacy. The following is a list of committee members as of January 1, 2019.

Randy Bennett, Ph.D. - ETS
Derek C. Briggs, Ph.D. - University of Colorado
Susan M. Brookhart, Ph.D. - Duquesne University
Gregory J. Cizek, Ph.D. - University of North Carolina
Shelbi Cole, Ph.D. - Student Achievement Partners
David T. Conley, Ph.D. - University of Oregon
Brian Gong, Ph.D. - The Center for Assessment
Edward Haertel, Ph.D. - Stanford University
Gerunda Hughes, Ph.D. - Howard University
G. Gage Kingsbury, Ph.D. - Psychometric Consultant
James W. Pellegrino, Ph.D. - University of Illinois, Chicago
Barbara Plake, Ph.D. - University of Nebraska, Lincoln
W. James Popham, Ph.D. - UCLA, Emeritus
Guillermo Solano-Flores, Ph.D. - Stanford University
Martha Thurlow, Ph.D. - University of Minnesota/NCEO
Sheila Valencia, Ph.D. - University of Washington

Short biographies of each TAC member can be found on the Smarter Balanced website: TAC Biographies.

Students with Disabilities Advisory Committee

The Students with Disabilities Advisory Committee is comprised of national experts in learning disabilities, assistive technology, and accessibility and accommodations policy. This committee provides feedback to Smarter Balanced staff, workgroups, and contractors to ensure that the assessments provide valid, reliable, and fair measures of achievement and growth for students with disabilities. The following is a list of committee members.

Donald D. Deshler, Ph.D.
Barbara Ehren, Ed.D.
Cheryl Kamei-Hannan, Ph.D.
Jacqueline F. Kearns, Ed.D.
Susan Rose, Ph.D.
Jim Sandstrum
Ann C. Schulte, Ph.D.
Richard Simpson, Ed.D.
Stephen W. Smith, Ph.D.
Martha L. Thurlow, Ph.D.

Performance and Practice Committee

The Performance and Practice Committee is comprised of nearly 20 educators from around the nation who were nominated by state chiefs. This committee assesses the efficiency of Smarter Balanced assessments to meet their designed purpose and to deepen overall stakeholder investment. The following is a list of committee members and their member affiliation.

Kandi Greaves (Vermont)
Mary Jo Faust (Delaware)
Shannon Mashinchi (Oregon)
Susan Green (California)
Steve Seal (California)
Tanya Golden (California)
Crista Anderson (Montana)
Melissa Speetjens (Hawaii)
Mike Nelson (Idaho)
Abby Olinger Quint (Connecticut)
Michelle Center (California)
Todd Bloomquist (Oregon)
Jim O’Neill (Montana)
Jen Paul (Michigan)
Toni Wheeler (Washington)
Joe Willhoft (Consultant)
Susan Brookhart (Technical Advisory Committee)

2018-19 Summative Technical Report