Understanding Testing: The case of the Rorschach

Posted August 18, 2009

The talented and charismatic Swiss psychiatrist Hermann Rorschach published Psychodiagnostik in 1921. Rorschach’s inkblots soon attracted a cult-like following and became the most widely used projective test in the world. The theory behind projective tests is that, when people are asked to describe an ambiguous stimulus, their descriptions will reveal their private thoughts and fantasies—a theory that seems plausible on its face.

In the summer of 2009, Wikipedia published Rorschach’s ten inkblots and the most common responses to them. In the July 30th, 2009 issue of Newsweek, Wray Herbert describes the firestorm that resulted. His article raises a number of issues that are worth additional comment; I will mention four. Click here to read the article.

First, the article confuses personality measurement with the assessment of psychopathology. This is a common mistake because, from the beginning of personality measurement in the late 19th century until after World War II, every major measure of personality was also a measure of psychopathology; these measures included the Rorschach and the Minnesota Multiphasic Personality Inventory (MMPI)—the most widely used objective measure of personality in the world. Research on performance in combat during the war showed that the absence of psychopathology does not predict effective performance—many people with problematic MMPI profiles perform well under pressure and many people with normal profiles perform poorly. Realizing that psychopathology is not necessarily related to effectiveness, pioneers such as Harrison Gough (author of the California Psychological Inventory in 1954) developed measures of normal personality to predict competent and effective performance. The point is, it is possible to assess personality without assessing psychopathology; and it is necessary to do so if one wants to predict effectiveness.

Second, along with many professional psychologists, the Newsweek article misrepresents the concept of test reliability. The reliability of any measure is a key issue in science. In the physical sciences, the reliability of a score is estimated by taking the same measure two or more times and comparing the scores. In contrast, many psychologists think that reliability should be estimated by how closely the items on a test cohere in a statistical fashion—but this has nothing to do with the reliability as defined in the physical sciences. The Newsweek article defines reliability in terms of the degree to which two people who score the same responses on the same test, get the same results. Although this definition is mistaken—because it concerns the reliability of the scoring method not the test scores—it is still closer to the scientific meaning of reliability than the definition used in academic psychology.

Third, there is nothing wrong, in principle, with the Rorschach. Like any test, it is a collection of (10) test stimuli, which by themselves mean nothing. The utility of any test depends on its scoring key. More specifically, the utility of a test depends on validity—the degree to which scores on the test predict real world outcomes. It is possible to develop scoring keys for the Rorschach that predict outcomes, but first it is necessary to understand what the purpose of assessment is. Assessment has a job to do, and that is to predict significant non-test behavior.

Finally, unlike many psychologists, Wray Herbert (the Newsweek writer) understands the importance of validity. At the close of his essay he notes that “This dust-up over the Rorschach could be just the beginning of a major intellectual housecleaning in a field that has drifted from its scientific roots.” As this comment indicates, validity is the scientific raison d’etre for assessment, but it is something that many test publishers ignore. This fact is a public scandal and one that will ultimately come to haunt the entire test publishing enterprise.