Date of Award

Spring 1-1-2011

Document Type


Degree Name

Doctor of Philosophy (PhD)



First Advisor

Derek C. Briggs

Second Advisor

Lorrie Shepard

Third Advisor

Edward Wiley

Fourth Advisor

David Webb

Fifth Advisor

Donald Waldman


Vertical scales are typically developed for the purpose of quantifying achievement growth. In practice, it is commonly assumed that all of the scaled tests measure a single construct; however, in many instances there are strong theoretical and empirical reasons to suspect that the construct of interest is multidimensional. By modeling and scaling the tests unidimensionally, interpretations of growth are likely to be distorted. As such, there may be value in developing multidimensional vertical scales to allow for examinations of growth on different dimensions. Empirical data from Colorado's large-scale math assessment is used to examine 1) sufficient conditions for establishing a multidimensional vertical scale and 2) the extent to which interpretations of growth are distorted when tests are vertically scaled multidimensionally versus unidimensionally.

A super test linking design is developed that allows items to be pooled over multiple years to increase the number of common items in establishing the multidimensional vertical scale. Based on this design, a resampling approach is used to examine linking error related to the number and format of common items and the choice of item response model and linking method. The results indicate that a minimum of 7-10 common items are needed per dimension to link the tests with acceptable amounts of error; there is no appreciable difference in the linking when dichotomous items are modeled using the M2PL versus the M3PL; and dichotomous, rather than mixed-format common items, with a variable dilation linking method should be preferred when creating a multidimensional scale.

Between-grade differences on a unidimensional and two multidimensional vertical scales are compared to identify distortions related to unaccounted-for dimensionality for cross-sectional versus longitudinal cohorts when the scales are created using a single-year design versus the super test design. The single-year design was found to be insufficient for establishing a multidimensional vertical scale. Differences between unidimensional and dimension-specific growth are generally small although there are notable distortions related to each of the modeled dimensions in various grades. For both the cross-sectional and longitudinal cohorts, growth was consistently distorted for dimensions associated with number operations/computational techniques and algebra. The findings suggest that the development of multidimensional vertical scales should be seriously considered.


Sixth advisor: Joseph Martineau.