Chapter 1 Validity

1.1 Introduction

Validity refers to the degree to which each interpretation or use of a test score is supported by the accumulated evidence (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014; ETS, 2002). It constitutes the central notion underlying the development, administration, and scoring of a test and the uses and interpretations of test scores.

Validation is the process of accumulating evidence to support each proposed score interpretation or use. The validation process does not rely on a single study or only one type of evidence. Rather, validation involves multiple investigations and different kinds of supporting evidence (AERA, APA, & NCME, 2014; Cronbach, 1971; ETS, 2002; Kane, 2006). It begins with test design and is implicit throughout the assessment process, which includes developing, field testing, and analyzing items; test scaling and linking; scoring; and reporting.

This chapter provides a framework for the validation of the Smarter Balanced summative assessment (Sireci, 2013). Following this introductory section, the validity argument is reviewed, including intended purposes for the summative assessment, types of evidence collected, and a high-level summary of the ways in which the evidence collected supports (or disconfirms) the intended uses and interpretations of the assessment. The main portion of the section on the validity argument consists of an evidentiary framework supporting the validity argument and points the reader to supporting evidence in other parts of the technical report and in other studies for each of the intended uses. Evidence is organized around the principles in the AERA, APA, and NCME’s Standards for Educational and Psychological Testing (2014), hereafter referred to as the Standards.

1.2 A Note on the Validity Evidence Presented in This Technical Report

The Standards note that “validation is the joint responsibility of the test developer and the test user.” (AERA et al., 2014, p. 13). The Consortium develops the test. Consortium members deliver the test, score operational items, provide reports, and incorporate test scores into their unique accountability models. For complete validity evidence, please also see member documentation on specific test administration procedures, reporting, and use.

This report does not provide evidence related to the consequences of testing. Consortium members determine the ultimate use of test scores. Each member decides the purpose and interpretation of scores and crafts their own system of reporting and accountability. The Consortium provides information about test content and technical quality but does not interfere in member use of scores. The Consortium does not endorse or critique member uses.

While it is beyond the scope and purpose of a technical report to evaluate evidence pertaining to consequences of testing, the breadth and depth of the supporting evidence demonstrates that the Smarter Balanced assessment system adheres to guidelines for fair and high-quality assessment. The Smarter Balanced summative assessments have been thoroughly evaluated through the U.S. Department of Education’s peer review process.

1.3 Purposes of the Summative Assessments

This section presents the intended purposes of the Smarter Balanced assessments, a brief discussion of the types of validity evidence collected to support those purposes, and a high-level overview of the validity argument. At the end of this section, an evidentiary framework is presented where available validity evidence is cited and described for each intended purpose.

The validity argument begins with a statement of the intended purposes of the summative assessments. The purposes of the Smarter Balanced summative assessments are to provide valid, reliable, and fair information about

  • students’ ELA/literacy and mathematics achievement with respect to those Common Core State Standards (CCSS) measured by the ELA/literacy and mathematics summative assessments in grades 3 to 8 and high school;

  • whether students prior to grade 11 have demonstrated sufficient academic proficiency in ELA/literacy and mathematics to be on track for achieving college readiness;

  • whether grade 11 students have sufficient academic proficiency in ELA/literacy and mathematics to be ready to take credit-bearing, transferable college courses after completing their high school coursework;

  • students’ annual progress toward college and career readiness in ELA/literacy and mathematics;

  • how instruction can be improved at the classroom, school, district, and state levels;

  • students’ ELA/literacy and mathematics proficiencies for federal accountability purposes and potentially for state and local accountability systems; and

  • students’ achievement in ELA/literacy and mathematics that is equitable for all students and targeted student groups.

1.4 Types of Validity Evidence

The intended purposes must be supported by evidence. The Standards describe a process of validation, often characterized as a validity argument (Kane, 1992), that consists of developing a sufficiently convincing, empirically based argument that the interpretations and actions based on test scores are sound.

A sound validity argument integrates various strands of evidence into a coherent account of the degree to which existing evidence and theory support the intended interpretation of test scores for specific uses. Ultimately, the validity of an intended interpretation of test scores relies on all the available evidence relevant to the technical quality of a testing system (AERA et al., 2014, pp. 21–22).

The sources of validity evidence described in the Standards (AERA et al., 2014, pp. 26–31) include:

A. Evidence Based on Test Content

Validity evidence based on test content refers to traditional forms of content validity evidence, such as the rating of test specifications and test items (Crocker, Miller, & Franks, 1989; Sireci, 1998), as well as “alignment” methods for educational tests that evaluate the interactions between curriculum frameworks, testing, and instruction (Rothman, Slattery, Vranek, & Resnick, 2002; Bhola, Impara, & Buckendahl, 2003; Martone & Sireci, 2009). Administration and scoring can be considered aspects of content-based evidence. In the case of computer adaptive test administration, confirmation that each test “event” administered to students conforms to the test blueprint can provide content-based evidence.

B. Evidence Based on Internal Structure

Validity evidence based on internal structure refers to statistical analyses of item and score subdomains to investigate the primary and secondary (if any) dimensions measured by an assessment. Procedures for gathering such evidence include factor analysis or multidimensional IRT scaling (both exploratory and confirmatory). For a test with a vertical scale, a consistent primary dimension or construct shift across the levels of the test should be maintained. Internal structure evidence also evaluates the “strength” or “salience” of the major dimensions underlying an assessment by using indices of measurement precision, such as test reliability, decision accuracy and consistency, generalizability coefficients, conditional and unconditional standard errors of measurement, and test information functions.

C. Evidence Based on Response Processes

Validity evidence based on response process refers to “evidence concerning the fit between the construct and the detailed nature of performance or responding actually engaged in by examinees” (AERA et al., 1999, p. 12). This type of evidence confirms that an assessment measures the intended cognitive skills and that students are using these targeted skills to respond to the items.

D. Evidence Based on Relations to Other Variables

Evidence based on relations to other variables refers to traditional forms of criterion-related validity evidence, such as concurrent and predictive validity, and more comprehensive investigations of the relationships among test scores and other variables, such as multitrait-multimethod studies (Campbell & Fiske, 1959). These external variables can be used to evaluate hypothesized relationships between test scores and other measures of student achievement (e.g., test scores and teacher-assigned grades), the degree to which different tests actually measure different skills, and the utility of test scores for predicting specific criteria (e.g., college grades).

E. Evidence Based on Consequences of Testing

As mentioned earlier, evidence related to the consequences of testing is beyond the scope of this technical report. Consortium members determine the ultimate use of test scores. Each member decides the purpose and interpretation of scores and crafts their own system of reporting and accountability. The Consortium provides information about test content and technical quality but does not interfere in member use of scores. The Consortium does not endorse or critique member uses.

These sources of validity evidence are intended to emphasize different aspects of validity; however, since validity is a unitary concept, they do not constitute distinct types of validity. Subsequent sections of this chapter provide descriptions and citations of studies and references to other sections of this technical report that contain such evidence.

1.5 Validity Evidence

Sireci (2012) proposed a comprehensive validity framework for Smarter Balanced assessments in which the purposes of the Smarter Balanced assessments were cross-classified with the five sources of validity evidence from the Standards. Table 1.1 presents a similar cross-classification, but with the source “Consequences of Testing” omitted for the reasons given above. For most cells in his table, Sireci described the kinds of validity studies that could be performed. Table 1.1 shows the combinations of purpose and evidentiary source for evidence cited in this chapter. The supporting evidence follows, with a list of evidence contained in this technical report in subsequent chapters and a list of studies and documents external to this report that may be found elsewhere. Additional validity evidence will be added to future technical reports as it becomes available.

Table 1.1: SMARTER BALANCED ASSESSMENT PURPOSES CROSS-CLASSIFIED BY SOURCES OF VALIDITY EVIDENCE
Purpose Sources of Validity Evidence
  1. Report achievement with respect to the CCSS as measured by the ELA/literacy and mathematics summative assessments in grades 3 to 8 and high school.
A. Test Content
B. Internal Structure
C. Response Processes
D. Relation to Other Variables
  1. Assess whether students prior to grade 11 have demonstrated sufficient academic proficiency in ELA/literacy and mathematics to be on track for achieving college and career readiness.
A. Test Content
B. Internal Structure
C. Response Processes
D. Relation to Other Variables
  1. Assess whether grade 11 students have sufficient academic proficiency in ELA/literacy and mathematics to be ready to take credit-bearing, transferable college courses after completing their high school coursework.
A. Test Content
B. Internal Structure
C. Response Processes
D. Relation to Other Variables
  1. Measure students’ annual progress toward college and career readiness in ELA/literacy and mathematics.
A. Test Content
B. Internal Structure
C. Response Processes
D. Relation to Other Variables
  1. Inform how instruction can be improved at the classroom, school, district, and state levels.
A. Test Content
B. Internal Structure
C. Response Processes
  1. Report students’ ELA/literacy and mathematics proficiency for federal accountability purposes and potentially for state and local accountability systems.
A. Test Content
B. Internal Structure
C. Response Processes
  1. Assess students’ achievement in ELA/literacy and mathematics in a manner that is equitable for all students and targeted student groups.
A. Test Content
B. Internal Structure
C. Response Processes

1.5.1 Validity Evidence Supporting Purpose 1

Purpose 1: Provide valid, reliable, and fair information about students’ ELA/literacy and mathematics achievement with respect to those Common Core State Standards (CCSS) measured by the ELA/literacy and mathematics summative assessments in grades 3 to 8 and high school.

Source A: Test Content

In This Report:

  • Chapter 4
    • Test blueprint, content specifications, and item specifications are aligned to the full breadth and depth of grade-level content, process skills, and associated cognitive complexity.
    • Blueprint fidelity studies are performed for each test administration for regular and accommodated populations. They are performed prior to test administration by simulation and following test administration using member data.
    • With very few exceptions, operational computer adaptive test events meet all blueprint constraints, both for the general student population and for students taking accommodated test forms.

List of Other Evidence Sources:

  • Smarter Balanced Content Specifications (Smarter Balanced Assessment Consortium, 2017a,b)
  • Evaluating the Content and Quality of Next Generation High School Assessments (Schultz, Michaels, Dvorak, & Wiley, 2016)
  • Evaluating the Content and Quality of Next Generation Assessments (Doorey & Polikoff, 2016)
  • Smarter Balanced Assessment Consortium: Alignment Study Report (HumRRO, November 2016)
  • Evaluation of the Alignment Between the Common Core State Standards and the Smarter Balanced Assessment Consortium Summative Assessments for Grades 3, 6, and 7 in English Language Arts/Literacy and Mathematics – Final Report (WestEd Standards, Assessment, and Accountability Services Program, 2017)
  • 2017–18 Smarter Balanced Summative CAT Simulation Results (American Institutes for Research, 2017)
  • Blueprint Fidelity of the 2017–18 Summative Assessment (Smarter Balanced Assessment Consortium, 2019)

Source B: Internal Structure

In This Report:

  • Chapter 2
    • Student measures and classifications have acceptable levels of precision and accuracy.

List of Other Evidence Sources:

  • 2013–14 Technical Report, Chapter 6: Pilot Test and Special Studies of Dimensionality (Smarter Balanced Assessment Consortium, 2016a)
  • Development Process (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2016)
  • Dimensionality of the SBAC: An Argument for Its Validity (Gaffney, 2015)
  • Pilot Test Data Analysis Results: Dimensionality Study and IRT Model Comparison (Educational Testing Service, 2014)

Source C: Response Processes

In This Report:

  • Chapter 3
    • Construct-irrelevant effects on student performance are minimized by applying principles of universal design in item development.
    • Item development and quality control process includes screening and reviewing field test items for potential construct-irrelevant difficulty due to bias against demographic groups.
  • Chapter 4
    • The item types and task models used in the assessment require response processes specified in the CCSS.

List of Other Evidence Sources:

  • Cognitive Laboratories Technical Report (American Institutes for Research, 2013)
  • Usability, Accessibility, and Accommodations Guidelines (Smarter Balanced Assessment Consortium, 2017c)
  • Achievement Level Setting Final Report (Smarter Balanced Assessment Consortium, 2017e)
  • Development Process (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2016)
  • Smarter Balanced Assessment Consortium: Alignment Study Report (HumRRO, November 2016)

Source D: Relation to Other Variables

In this Report:

  • None

List of Other Evidence Sources:

  • External Validity: Analysis of Existing External Measures (National Center for Research on Evaluation, Standards, and Student Testing, 2016)
  • The Relationship Between Smarter Balanced Test Scores and Grades in ELA and Mathematics Courses (Washington Office of Superintendent of Public Instruction, OSPI, 2016)
  • Linking Study Between Smarter Balanced Mathematics Field Test and CSU Entry-Level Math Test (Educational Testing Service, 2015b)
  • Linking the Smarter Balanced English Language Arts/Literacy Summative Assessment with the Lexile Framework for Reading (MetaMetrics, 2016a)
  • Linking the Smarter Balanced Mathematics Summative Assessment with the Quantile Framework for Mathematics (MetaMetrics, 2016b)
  • Study of the Relationship Between the Early Assessment Program and the Smarter Balanced Field Tests (Educational Testing Service, 2015a)

1.5.2 Validity Evidence Supporting Purposes 2 and 3

Purpose 2: Provide valid, reliable, and fair information about whether students prior to grade 11 have demonstrated sufficient academic proficiency in ELA/literacy and mathematics to be on track for achieving college readiness.

Purpose 3: Provide valid, reliable, and fair information about whether grade 11 students have sufficient academic proficiency in ELA/literacy and mathematics to be ready to take credit-bearing, transferable college courses after completing their high school coursework.

Source A: Test Content

In This Report:

  • Chapter 4
    • Smarter Balanced tests are linked to college and career readiness by the incorporation of the CCSS into item and test development specifications.

List of Other Evidence Sources:

  • Smarter Balanced Content Specifications (Smarter Balanced Assessment Consortium, 2017a,b)
  • ALDs and College Content-Readiness Policy (Smarter Balanced Assessment Consortium, 2013b,c)
  • Reaching the Goal: The Applicability and Importance of the Common Core State Standards to College and Career Readiness (Conley, Drummond, de Gonzalez, Rooseboom, & Stout, 2011)
  • Technical Report: Initial Achievement Level Descriptors (Smarter Balanced Assessment Consortium, 2013a)
  • Achievement Level Setting Final Report (Smarter Balanced Assessment Consortium, 2017e)
  • Evaluating the Content and Quality of Next Generation Assessments (Doorey & Polikoff, 2016)

Source B: Internal Structure

In This Report:

  • Chapter 2
    • Smarter Balanced tests reliably measure on a scale that is demarcated by achievement levels representing various levels of college and career readiness at every grade and reliably classify students into the achievement levels.
  • Chapter 5
    • The measurement scale and achievement levels are vertically articulated to facilitate tracking and understanding of student growth over time.

List of Other Evidence Sources:

  • Achievement Level Setting Final Report (Smarter Balanced Assessment Consortium, 2017e)
  • Evaluating the Content and Quality of Next Generation High School Assessments (Schultz, Michaels, Dvorak, & Wiley, 2016)

Source C: Response Processes

In this Report:

  • Chapter 4
    • Test blueprint, content specifications, and item specifications are aligned to grade-level content, process skills, and associated cognitive complexity.
  • Chapter 5
    • The standard setting process relied on stakeholder judgments about college and career readiness based on student responses to, and the response processes elicited by, test items.

List of Other Evidence Sources:

  • Cognitive Laboratories Technical Report (American Institutes for Research, 2013)
  • Development Process (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2016)
  • Achievement Level Setting Final Report (Smarter Balanced Assessment Consortium, 2017e)

Source D: Relation to Other Variables

In This Report:

  • None

List of Other Evidence Sources:

  • Study of the Relationship Between the Early Assessment Program and the Smarter Balanced Field Tests (Educational Testing Service, 2015)
  • Linking the Smarter Balanced English Language Arts/Literacy Summative Assessment with the Lexile Framework for Reading (MetaMetrics, 2016a)
  • Linking the Smarter Balanced Mathematics Summative Assessment with The Quantile Framework for Mathematics (MetaMetrics, 2016b)
  • Linking Study Between Smarter Balanced Mathematics Field Test and CSU Entry-Level Math Test (Educational Testing Service, 2015b)
  • Hawaii Smarter Balanced Assessments 2014–15 Technical Report (Addendum to the Smarter Balanced Technical Report, pp. 48-50) (American Institutes for Research, 2015)
  • Predicting College Success: How Do Different High School Assessments Measure Up? (Kurlaender, Kramer, & Jackson, 2018)
  • System Undergraduate Admissions (second reading) (South Dakota Board of Regents, 2017)

1.5.3 Validity Evidence Supporting Purpose 4

Purpose 4: Provide valid, reliable, and fair information about students’ annual progress toward college and career readiness in ELA/literacy and mathematics.

Source A: Test Content

In This Report:

  • Chapter 4
    • Item development and test construction processes define a continuum of increasing knowledge, skill, and ability with increasing grade level.
    • Due to the increasing knowledge, skills, and abilities represented by test items, the difficulty of the item pool, by grade, increases as expected in parallel with increases in student achievement by grade. The difficulty of grade 3 items for third-grade students is comparable to the difficulty of grade 8 items for eighth-grade students.

List of Other Evidence Sources:

  • Smarter Balanced Content Specifications (Smarter Balanced Assessment Consortium, 2017a,b)
  • ALDs and College Content-Readiness Policy (Smarter Balanced Assessment Consortium, 2013b,c)
  • Development Process (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2016)
  • Evaluating the Content and Quality of Next Generation Assessments (Doorey & Polikoff, 2016)
  • Evaluating the Content and Quality of Next Generation High School Assessments (Schultz, Michaels, Dvorak, & Wiley, 2016)

Source B: Internal Structure

In This Report:

  • Chapter 2
    • The precision of student measures at each grade is sufficient for measuring change in student achievement over time.
  • Chapter 5
    • The vertical measurement scale enables the tracking of student progress towards college and career readiness.
    • The vertical articulation of achievement levels provides benchmarks for tracking student progress towards college and career readiness.

List of Other Evidence Sources:

  • Achievement Level Setting Final Report (Smarter Balanced Assessment Consortium, 2015e)
  • 2013–14 Technical Report, Chapter 9: Field Test IRT Scaling and Linking Analyses (Smarter Balanced Assessment Consortium, 2016a)
  • Interpretation and Use of Scores and Achievement Levels (Smarter Balanced Assessment Consortium, 2017f)

Source C: Response Processes

In This Report:

  • Chapter 4
    • Test blueprints require increasing levels of knowledge, skill, and ability compatible with growth in college and career readiness through specifications governing item types and depth of knowledge (DOK) in test blueprints.

List of Other Evidence Sources:

  • Cognitive Laboratories Technical Report (American Institutes for Research, 2013)
  • Development Process (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2016)
  • Achievement Level Setting Final Report (Smarter Balanced Assessment Consortium, 2017e)
  • Smarter Balanced Assessment Consortium: Alignment Study Report (HumRRO, November 2016)
  • Evaluation of the Alignment Between the Common Core State Standards and the Smarter Balanced Assessment Consortium Summative Assessments for Grades 3, 4, 6, and 7 in English Language Arts/Literacy and Mathematics (WestEd Standards, Assessment, and Accountability Services Program, 2017)

Source D: Relation to Other Variables

In This Report:

  • None

List of Other Evidence Sources:

  • Developing Connecticut’s Growth Model for the Smarter Balanced Summative Assessments in English Language Arts (ELA) and Mathematics (Connecticut State Department of Education, 2016)
  • Longitudinal Analysis of SBAC Achievement Data (2015 and 2016) (National Center for Research on Evaluation, Standards, and Student Testing, 2018)
  • The Relationship Between the Smarter Balanced Grade 8 Assessments and the PSAT 8/9 Assessments (Connecticut State Department of Education, 2017)

1.5.4 Validity Evidence Supporting Purpose 5

Purpose 5: Provide valid, reliable, and fair information about how instruction can be improved at the classroom, school, district, and state levels.

Source A: Test Content

In This Report:

  • Chapter 4
    • Test blueprint, content specifications, and item specifications are aligned to grade-level content, process skills, and associated cognitive complexity.
    • The blueprint was developed in consultation with educators.
    • Assessment claims align with the structure of the CCSS to support the interpretation of the assessment results.
  • Chapter 7
    • Test result subscores are based on educationally relevant claims.
    • Reports describe in terms of test content what students in each claim performance category can or cannot do.

List of Other Evidence Sources:

  • Smarter Balanced Content Specifications (Smarter Balanced Assessment Consortium, 2017a,b)
  • End of Grant Report (Smarter Balanced Assessment Consortium, 2015b, p. 28)
  • Evaluating the Content and Quality of Next Generation Assessments (Doorey & Polikoff, 2016)
  • Evaluating the Content and Quality of Next Generation High School Assessments (Schultz, Michaels, Dvorak, & Wiley, 2016)
  • Linking the Smarter Balanced English Language Arts/Literacy Summative Assessment with the Lexile Framework for Reading (MetaMetrics, 2016a)
  • Linking the Smarter Balanced Mathematics Summative Assessment with the Quantile Framework for Mathematics (MetaMetrics, 2016b)
  • Smarter Balanced Assessment Consortium: Alignment Study Report (HumRRO, 2016)
  • Evaluation of the Alignment Between the Common Core State Standards and the Smarter Balanced Assessment Consortium Summative Assessments for Grades 3, 4, 6, and 7 in English Language Arts/Literacy and Mathematics (WestEd Standards, Assessment, and Accountability Services Program, 2017)

Source B: Internal Structure

In This Report:

  • Chapter 2
    • Threshold, range, and policy achievement levels were developed in consultation with educators to provide information to educators.
    • Assessment claims align with the structure of the CCSS to support the interpretation of the assessment results.
  • Chapter 5
    • Reports of test results provide point estimates and error bands for individual and group total test performance.
    • Students are reliably classified into criterion-referenced achievement levels by overall score.

List of Other Evidence Sources:

  • Development Process (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2016)
  • Achievement Level Setting Final Report (Smarter Balanced Assessment Consortium, 2017e)

Source C: Response Processes

In This Report:

  • Chapter 4
    • Test blueprint, content specifications, and item specifications are aligned to grade-level content, process skills, and associated cognitive complexity.
  • Chapter 5
    • Threshold, range, and policy achievement levels were developed in consultation with educators to provide information to educators.

List of Other Evidence Sources:

  • Technical Report: Initial Achievement Level Descriptors (Smarter Balanced Assessment Consortium, 2013a)
  • Achievement Level Setting Final Report (Smarter Balanced Assessment Consortium, 2017e)
  • Smarter Balanced Assessment Consortium: Alignment Study Report (HumRRO, November 2016)
  • Evaluation of the Alignment Between the Common Core State Standards and the Smarter Balanced Assessment Consortium Summative Assessments for Grades 3, 4, 6, and 7 in English Language Arts/Literacy and Mathematics (WestEd Standards, Assessment, and Accountability Services Program, 2017)

1.5.5 Validity Evidence Supporting Purpose 6

Purpose 6: Provide valid, reliable, and fair information about students’ ELA/literacy and mathematics proficiencies for federal accountability purposes and potentially for state and local accountability systems.

Source A: Test Content

In This Report:

  • Chapter 4
    • Achievement levels were set for the explicit purpose of reporting student achievement as part of federal accountability.
  • Chapter 6
    • Assessments are administered in a standardized manner sufficient to yield data that supports valid inferences.

List of Other Evidence Sources:

  • Achievement Level Setting Final Report (Smarter Balanced Assessment Consortium, 2017e)
  • Smarter Balanced Assessment Consortium: Online Summative Test Administration Manual (September, 2017)
  • Smarter Balanced assessments were approved in a federal peer review process that took test content into consideration

Source B: Internal Structure

In This Report:

  • Chapter 2
    • The assessment supports precise measurement and consistent classification to support analysis as part of state and local accountability systems.

List of Other Evidence Sources:

  • Smarter Balanced assessments were approved in a federal peer review process that took test internal structure into consideration
  • Achievement Level Setting Final Report (Smarter Balanced Assessment Consortium, 2017e)

Source C: Response Processes

In This Report:

  • Chapter 5
    • Achievement levels were set for the explicit purpose of reporting student achievement as part of federal accountability. Standard setting panelists based their cut score recommendations in part on student performance relative to the response processes required by test items.

List of Other Evidence Sources:

  • Cognitive Laboratories Technical Report (American Institutes for Research, 2013)
  • Achievement Level Setting Final Report (Smarter Balanced Assessment Consortium, 2015e)
  • Smarter Balanced assessments were approved in a federal peer review process that took response processes into consideration

1.5.6 Validity Evidence Supporting Purpose 7

Purpose 7: Provide valid, reliable, and fair information about students’ achievement in ELA/literacy and mathematics that is equitable for all students and targeted student groups.

Source A: Test Content

In This Report:

  • Chapter 3
    • Bias is minimized through universal design and accessibility resources.
  • Chapter 4
    • Computer adaptive assessments that meet blueprint constraints are consistently delivered to all students and targeted student groups.
  • Chapter 6
    • Assessments are administered in a standardized manner sufficient to yield data that supports valid inferences.

List of Other Evidence Sources:

  • Smarter Balanced Assessment Consortium: Accommodations for English Language Learners and Students with Disabilities: A Research-Based Decision Algorithm (Abedi, Ewers, & Davis, 2013)
  • General Accessibility Guidelines (Smarter Balanced Assessment Consortium, 2012a)
  • Smarter Balanced: Online Test Administration Manual (Smarter Balanced Assessment Consortium, September 2017)
  • Usability, Accessibility, and Accommodations Guidelines (Smarter Balanced Assessment Consortium, 2017c)
  • Guidelines for Accessibility for English Language Learners (Smarter Balanced Assessment Consortium, 2012b)

Source B: Internal Structure

In This Report:

  • Chapter 2
    • The assessment supports precise measurement and consistent classification for all students.
  • Chapter 3
    • Differential Item Functioning (DIF) analysis is performed for all items across all required targeted student groups.
  • Chapter 4
    • Multidisciplinary data reviews are enacted to resolve each observed incident of DIF.

List of Other Evidence Sources:

  • Cognitive Laboratories Technical Report (American Institutes for Research, 2013)
  • Usability, Accessibility, and Accommodations Guidelines (Smarter Balanced Assessment Consortium, 2017c)

Source C: Response Processes

In This Report:

  • Chapter 3
    • Principles of accessibility and universal design are used in item development.
  • Chapter 3 and Chapter 6
    • Accommodations, universal tools, and designated supports are provided to students.
    • Specialized tests and item pools are provided.

List of Other Evidence Sources:

  • Cognitive Laboratories Technical Report (American Institutes for Research, 2013)
  • Development Process (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2016)
  • Usability, Accessibility, and Accommodations Guidelines (Smarter Balanced Assessment Consortium, 2017c)

1.6 Conclusion for Summative Test Validity Results

Validation is a perpetual endeavor in which additional evidence can be provided, but one can never absolutely “assert” all interpretations or uses of an assessment are perfectly valid (Haertel, 1999). This is particularly true when many purposes are placed on tests. Nonetheless, at some point, decisions must be made regarding whether sufficient evidence exists to justify the use of a test for a particular purpose. Much of the information in this technical report supports the validity of the Smarter Balanced summative assessment for one or more of its purposes. Additional evidence in the studies and documents cited in this chapter provides further critical support.