The implications of testing flexibility on measuring student growth and learning trends

Formerly the South Dakota secretary of education, Melody Schopp is director of education industry consulting at analytics provider SAS. Angela Quick is vice president of education and workforce development at RTI International, where she researches STEM school design, coursework articulation and recruitment of underrepresented populations in STEM programming. Quick previously served as deputy chief academic officer at the North Carolina Department of Public Instruction.

The U.S. Department of Education has proposed flexibility in spring 2021 test administration, such as shortening tests, offering tests remotely or extending testing windows. The department has also offered waivers that exempt schools from a minimum requirement of 95% student participation in testing.

Administering some version of statewide assessments in spring 2021 will help parents and educators to understand how much the pandemic has affected student learning in districts, schools and different groups of students. The data may be less complete, but given the potentially significant impact on student learning, collecting the data is critical to designing instructional supports and interventions.

It is important for education leaders and policymakers to understand the promise and limitations of the flexible options, particularly related to measuring growth and achievement trends during the pandemic.

Skip-year growth measurement is possible

Despite the fluctuating testing climate, sophisticated statistical approaches like value-added models (VAMs) and student growth percentiles (SGPs) can tackle skip-year growth. This is not uncharted territory. SAS has over 20 years of experience supporting many states with a variety of growth models and testing challenges.

In 2016, Tennessee did not administer summative assessments in grades 3-8. The following year, the state measured growth over a two-year period from 2015 to 2017. To assist with this decision, SAS used prior years’ data to compare growth measures with and without a missing year of data. The simulations showed the simulated skip-year results were highly correlated to the actual results observed over the same two-year period.

Shorter tests

An assessment should differentiate the performance of both high-achieving and low-achieving students. If assessments are shortened too much, a floor or ceiling effect could prevent educators from understanding the true impact of the pandemic on those student groups.

Assuming it’s not a computer adaptive assessment, there must be sufficient items/questions at varying difficulty to make those differentiations. In our experience, 40 to 50 items on a non-computer adaptive assessment typically provides enough stretch in the scale to reliably measure student growth.

If a state assessment does not sufficiently differentiate student achievement, then growth measures would not be available for that assessment based on the 2020-21 school year and even the 2021-22 school year. This could also limit the possibilities for research about the pandemic’s impact on different groups of students.

Remote testing

There are real concerns about whether remote testing environments are comparable to in-person conditions. State education agencies can compare students’ scores in a remote environment with previous scores. Depending on state policies and preferences, states can include, exclude or adjust student scores as needed when using them in growth models or analysis about the pandemic’s impact.

States should seriously consider capturing data about students’ learning environments to spot trends in program effectiveness. By measuring academic growth across different student experiences, states can measure the impact of different virtual or hybrid learning programs.

Extended or multiple testing windows

Extending or offering multiple testing windows, does not necessarily undermine reliable growth measures. In a typical year, a long testing window might advantage students who have more learning time than others. However, given the difference in learning environments and modalities in 2020-21, the impact is not as clear, and a long testing window adds another factor to consider when interpreting results. Analyses of this year’s assessment results can reveal trends related to assessment timing.

In the case of many student growth models, a shift in the administration window will not skew the overall distribution of results, as expected growth is a relative measure and will be based on results from the revised administration window.

However, a shift from spring 2021 to fall 2021 will likely affect research about the pandemic’s impact on student achievement over time. Such research could use pre-pandemic assessment data to determine students’ expected performance based on prior achievement. But if assessments shift to the fall, these expectations would be based on past administrations in the spring.

As a result, the impact of the pandemic on student achievement might be conflated with any summer learning loss or enhanced summer programs, which could complicate interpretation. To ensure comparability with prior and future student performance, states should consider keeping testing administration in spring 2021, even if the windows are widened.

Fewer students taking tests

The waiver for the 95% student testing participation requirement acknowledges the reality of reduced participation in schools and likely testing opt-outs. Fewer test-takers included in the growth model will likely influence a state’s understanding of the pandemic’s impact on student learning. However, measurements from students who do participate in spring testing, though a subset of all students, will still bring valuable insights to educators, parents, researchers, and policymakers.

While many states are waiving testing data for use in accountability, growth and achievement data can be used to inform strategies and accelerate student learning in the future. The interpretation of these results might vary compared to previous years, particularly if the lower participation rates are localized to certain student groups.

Like 2020, this will be a year unlike any other. State and education leaders need to understand their options and consider how to accurately interpret and use the testing data we can collect. Uses of assessment and growth results in 2020-21 will be complicated and require careful collaboration across the testing and policy sectors.

In many ways, the data from assessment results are more important than ever to guide responses and inform recovery. At the same time, uses of assessment results for accountability decisions will need to be carefully considered. Given the variability of instructional delivery and quality across states, education leaders must understand the potential and the limitations of measurement as they tackle the next phase of COVID recovery.

Schopp and Quick recently participated in an RTI International webinar on flexible testing options and measuring student learning loss, “Assessing Student Learning Amidst Standardized Testing Changes.”