Validity

Posted August 20, 2008

For test developers and test users, validity is the most fundamental concept in psychological assessment. It is also a surprisingly vexed notion. A review of the literature on validity composed by the “great minds” (e.g., Lee Cronbach, Jane Loevinger, Paul Meehl) will give you a case of vertigo. The definition of validity found in the AERA Standards for Educational and Psychological Testing is a statement by a committee—in the same way that a camel is a horse designed by a committee. The confusion about validity is the result of the way psychological measurement was conceptualized at the outset, beginning with Charles Spearman’s research on intelligence. Spearman taught at a private boys’ school; he noticed that his students’ scores on their various academic examinations were correlated. Based on this, he derived two conclusions. First, he proposed that there was a single, general factor underlying performance on all the exams. And second, he proposed that that factor was (or reflected) “intelligence”. Spearman set the framework and the terms of the discussion for all subsequent assessment research. The framework consists of two assumptions that follow from his two conclusions. Since Spearman’s time, virtually all psychological assessment has been based on these assumptions, neither of which is necessary or necessarily true. The first assumption is that individual differences in human performance depend on, or are related to, or reflect individual differences in the strength or magnitude of a corresponding underlying trait or propensity. The second assumption is that the goal of assessment is to measure individual differences in the strength or magnitude of the underlying trait. In this view, the goal of assessment is to measure traits. This view is never questioned, and it has implications for understanding validity. In this standard model, a test or measure is valid to the degree that it accurately measures the underlying trait. But this view of validity makes it impossible ever to determine validity. The problem is that the actual existence of traits is questionable on genetic and neuropsychological grounds. There are no anatomical or neurological structures corresponding to any of the many traits that have been proposed. Consequently, it is by definition impossible to determine whether a test accurately measures an underlying trait—because the existence of trait is in doubt. The confusion about the meaning of validity can be resolved fairly easily with a simplifying assumption. If we assume that the goal of assessment is to predict outcomes, then validity can be defined in terms of the ability to predict those outcomes. A measure of sales performance is valid if it predicts sales performance; a measure of customer service potential is valid if it predicts ratings of customer service, and so on. This view makes no assumptions about the existence of underlying traits the strength of which causes performance. It simply stipulates that assessment has a job to do, and that job is to predict non-test outcomes. This definition of validity satisfies the requirement of Occam’s razor, which states that one ought not multiply causal entities unnecessarily; the definition also satisfies the aesthetic of the Bauhaus movement which states that less is more.