A Quick and Dirty Guide to Validity & Reliability

Choosing the right assessment for selecting or developing employees can make or break the success of a talent initiative. Why bother using assessments that don’t predict performance, or that fail to resonate with your business leaders?

When deciding on the right assessment for your valuable talent, pay attention to the scientific rigor with which the instruments have been tested. Any good tool should have concrete data demonstrating its validity and reliability. Validity and reliability can tell you two general things: 1) that the assessment is measuring what you want it to, and 2) that it will reliably assess the same thing each time — ensuring that the results you get aren’t a one-off.

An easy way to think about this concept is with a bullseye metaphor: The very center of the bullseye is exactly what you want to assess.

Reliable but not valid means that you are consistently testing the same thing over and over again, but it’s not testing what you want to test.

Valid but not reliable means that the average scores align with the goals of the test, but individual scores are inconsistent.

Both reliable and valid means that the test will consistently measure what it is supposed to over a period of time - it’s consistently hitting the bullseye.

What is Validity?

Validity refers to the accuracy of the assessment. In essence, does it measure what it is supposed to measure? While there are several types of validity to pay attention to, the most important for our purposes is predictive validity.

Predictive validity tells us how accurate a tool is at predicting a certain outcome. In the case of personality assessments, a good tool will be able to predict how well someone will perform their job. Validity is typically measured with a coefficient between -1 and 1 (called the Pearson correlation coefficient). The closer to one, the higher the predictive power of the test. The predictive validity of the Hogan Personality Inventory (HPI) is .29 for predicting performance across job families. However, when the HPI is combined with the Hogan Development Survey (HDS) and Motives, Values, and Preferences Inventory (MVPI), that number jumps to .54. While this may not seem very high, a good comparison is to look at the validity for something completely unrelated.

For example, the predictive validity of ibuprofen for pain reduction is only .14. For another, more closely-related example, the correlation between structured job interviews and job performance is .18. There are many ways of measuring validity, some more useful than others. Any assessment provider worth their salt should be able to provide you with evidence of validity. If they don’t, it’s worth considering why not.

What is Reliability?

Reliability, on the other hand, refers to the consistency of the test. The reliability of an assessment can be evaluated in two broad ways: 1) internal consistency and 2) test-retest reliability.

Test-retest reliability is a measure of consistency of responses over time. In other words, are people responding to questions the same way each time they take the test? Inconsistent responses can indicate that assessments results are not actually measuring personality, which should be relatively stable over time. Test-retest reliability uses a correlation of scores (again, using the Pearson coefficient) from a first assessment and a second assessment sometime later. For Hogan, the short-term test-retest reliability is .81 for the HPI, .70 for the HDS, and .79 for the MVPI.

Internal consistency relates to the questions that are used in each assessment. Test takers will notice that many questions appear to measure the same thing. This is on purpose. Asking a question in a few different ways helps us to ensure that we are getting an accurate measurement of the concept. Internal consistency scores are measured between 0 and 1 (this time with a coefficient called Cronbach’s alpha).i The closer to one, the higher the internal consistency reliability. The average internal consistency for the HPI scales is .76, .71 for the HDS, and .76 for the MVPI.

The important thing to note is that there is no one right way to measure reliability or validity. In fact, assessment publishers should constantly be monitoring their products to ensure they are as effective as they claim. Hogan Assessments are far above industry standards with continual evaluation of our own assessments. We are partial though, and we encourage you to seek out this information with any assessment system you choose.

Hogan Assessments have appeared in over 400 peer-reviewed publications to ensure that our tests are hitting the bullseye. We invite you to contact us for more information on the validity and reliability of Hogan Assessments at info@hoganassessments.com or +1 918 749 0632.

Note

i. Absolute value. Scores between -1 and 0 indicate a negative correlation.

Rethinking Self-Awareness: Freud versus Socrates

Most people who are interested in helping others to improve their careers would agree that individual differences in self-awareness impact career outcomes. This talk builds on that agreement by analyzing three topics: (1) How to define self-awareness? (2) Does self-awareness matter? And (3) How to increase self-awareness? Read More »

5 Ways to Manage Creativity and Drive Innovation

In a society that craves novelty and new technology, staying on the cutting edge is paramount to an organization’s survival. What better way to stay one step ahead in the product line than to have a strong creative team tinkering away behind the scenes. Read More »

Why Personality Matters

Why does personality matter? To answer this question, we need to resolve two prior issues:

What is personality?Who wants to know why personality matters?

The answer to the question, “What is personality?” is that there are two answers. There is what we call “personality from the inside” and there is what we call “personality from the outside”. Personality from the inside concerns your view of you, it concerns the person you think you are—it concerns your hopes, your dreams, your values, your goals, your aspirations, your fears, and the things you think you need to do to realize your goals and avoid your fears. We refer to personality from the inside as your identity.

Personality from the outside concerns our view of you, the person we think you are, and we refer to this as your reputation. It concerns the things we need to know in order to be able to deal with you effectively. So, there is the you that you know, personality from the inside, or your identity. Then there is the you that we know, personality from the outside, or your reputation.

These two forms of personality are different in very important ways. Consider the you that you know—your identity. Freud would say that it is hardly worth knowing—because you made it up. Everyone has to be someone, and you are the hero or heroine in your own life’s drama, but that doesn’t mean that your identity is necessarily closely related to reality. The way people think about and describe themselves is only modestly related to how others describe them—people don’t really know themselves all that well. Even worse, about 100 years of research on identity shows that it is very hard—almost impossible—to study in a rigorous and empirical way. As a result, we psychologists don’t know very much about identity that is interesting or useful.

Consider the you that we know—your reputation. Reputation is quite interesting for several reasons. First, the best predictor of future behavior is past behavior; your reputation reflects your past behavior, therefore your reputation is the best information we have regarding what you are likely to do in the future. Second, reputations are easy to study—we need only ask other people to describe you. And third, there is a well-defined and widely accepted taxonomy of reputations that has been used to study occupational performance, and as a result, we psychologists know a lot about the kinds of people who do well or poorly in different kinds of jobs. That is, we know a lot about the links between reputation and occupational performance.

As for the question of who wants to know why personality matters, it matters to two categories of people: (a) people who are interested in their own career development; and (b) potential employers. People who are interested in their own career development need to know about their own strengths and shortcomings relative to the demands of various occupations. More precisely, people who want to approach the topic of career development in a strategic manner will want to know: (1) How their strengths match the demands of various careers; and(2) how other people will perceive them during job interviews and while working.

Personality matters to potential employers in at least three ways. First, they need to know what kind of employee you will be—will you be cranky, difficult, and hard to manage or will you be a world-class organizational citizen? Second, they need to know if your personality fits the demands of the job for which you are applying—do you have the drive to succeed in sales, the social skills to succeed in customer service, the good judgment to succeed as a manager? And third, they need to know if your values (your identity) are consistent with the corporate culture—it doesn’t matter how talented you are, if your values are inconsistent with the corporate culture, you will not succeed in that organization.

The bottom line is that personality matters to individuals because self-understanding allows a person to be strategic about his/her career choices and career development. Personality matters to employers because knowledge about a job applicant’s personality allows them to be strategic about the hiring process.

Want to learn more about personality tests? Check out The Ultimate Guide to Personality Tests

How Faking Impacts Personality Assessment Results

People constantly ask us how faking affects the results of personality assessment. We believe that faking doesn’t matter, and we say this for two reasons. First, the data show that people can’t or don’t fake their answers on personality questionnaires. Second, it is very hard to define faking. Let’s take these two points in order.

1. Do People Fake their Responses on Personality Inventories?

With regard to people’s ability to fake their scores on, for example, the Hogan Personality Inventory (HPI), consider the following. Hogan, Barrett, and Hogan (2007) tested over 5,000 job applicants using the HPI; these people were subsequently denied employment. Six months later, they reapplied for the same job and completed the HPI a second time. It seems reasonable to assume that they would try to improve their scores the second time; however, virtually none of their scores changed beyond the standard error of measurement. The data are quite clear—even when motivated to fake, people’s scores on the HPI don’t change.

We believe that concerns about faking reflect a misunderstanding of what people do when they answer items on a personality inventory. There are two views about this: (a) self-report; and (b) impression management. These views have very different implications for understanding faking.

The Self-Report View of Faking. This view assumes that people answer items on personality measures by providing factual self reports, and faking involves lying. This view assumes that memory is like a videotape so that, when people read an item on an inventory (“I read ten books a year”), they compare the item with the memory tape, and then “report”; that is, they offer factual accounts of how an item matches their memory tape. Faking involves providing false reports about the match between the content of an item and the content of memory.

However, memory researchers from Bartlett (1937) to the present agree that memories are not factual recordings of past events, they are self-serving reconstructions designed to create a particular impression on others—people construct their memories and use them for strategic purposes.

Impression Management Theory. Our view is that during social interaction, people try to maximize social acceptance and status and minimize rejection and the loss of status (i.e., they are engaged in trying to get along and get ahead—cf. Hogan, 2006). When people respond to employment interview questions, assessment center exercises, or items on a personality inventory, they do what they always do during social interaction—they try to create a favorable impression of themselves. In this view, faking involves distorting the way one normally talks about oneself during social interaction. Personality is most important in social interaction, and it is useful to compare social interaction with hand writing. Both are skilled performances, they reflect skills that we learned as children. In both cases, we perform best when we think about what we are trying to do, and we perform poorly when we think about how we are doing it—that is, both types of performance are best when they are unreflective, automatic, even unconscious. Now imagine trying to change your handwriting—i.e., to fake it. It is extremely difficult to do, and even when we try, others can almost always recognize the handwriting as ours. Now imagine trying to change the way you interact with others—i.e., to fake. It is extremely difficult to do, and even when we try, others still know it is us.

2. How to Define Faking

 We do not believe there are “true selves” inside people. We make a distinction between the person you think you are (your identity) and the person we think you are (your reputations). Identity is very hard to study; after 100 years of research, we still don’t know much about it. However, this may not matter because it seems clear that we make it up our identities in idiosyncratic ways. On the other hand, reputation is easy to study, we know a lot about it, and it is immensely consequential for our lives and careers. Successful people know how to control their reputations—this is impression management. Moreover, many successful people (e.g., former President Ronald Reagan) never engage in introspection and are unable and unwilling to talk about their identities; however, they are very shrewd at managing their reputations.

But most importantly, the HPI is designed to predict reputation not measure identity. Reputation reflects a person’s typical interpersonal style, which is like a person’s handwriting. So with regard to faking, the question of whether a person’s answers (to the HPI) are consistent with a person’s identity (his/her true self) is irrelevant. The right question concerns whether a person’s answers are consistent with a person’s reputation? This is the sense in which faking should be understood. Consider the goals of child rearing. Small children usually act in ways that are consistent with their “true” desires and urges—they pee their pants when they feel like it. Child rearing consists almost entirely of training children to hide, or at least delay, their “true” desires and to behave in ways that are consistent with the norms of civilized adult conduct. Some people believe that child rearing involves teaching children to fake, to be inauthentic, and is a process that is inherently alienating. In contrast, we believe child rearing involves training children to behave in a socially appropriate manner, and that their natural selves are something they need to overcome.

The items on the HPI sample ordinary socialized adult behavior. Most adults know the rules of conduct and respond to HPI items in terms of social norms rather than in terms of their true (childlike) desires and urges. On the other hand, criminals and other deviants answer HPI items in ways that are more consistent with their “true” selves—and with their typical behavior. The larger point is that it is almost impossible to distinguish faking from socialized behavior. And this means that it is very hard to assign a clear meaning to the claim that some people fake when they respond to personality measures.

3. A Practical Example

Consider a practical question. There are three HICs on the HPI Prudence scale (Moralistic, Mastery, and Virtuous) that are designed to measure implausible responding—i.e., “faking”. A sample item would be “I have never told a lie.” Scores on these three HICs are normally distributed. How should we interpret very high or perfect scores on these three HICs? Regarding this interpretation, we would make four points. First, it is important to remember that the HPI is designed to predict reputation, to predict how people will be described by others—it is NOT designed to predict how people think about themselves. Thus, people with very high scores on these three HICs will seem socially appropriate in the extreme. They will be mannerly, polite, buttoned down, and careful not to give offense

Second, a general principle is that scores should be interpreted in the context of other scores. A relatively frequent pattern of scores includes a very high score on these three HICs and a high score on the Mischievous scale of the HDS. This pattern suggests a person with good social skills—clever, pleasant, and quite charming—who is also manipulative, agenda-driven, risk-taking, limit testing, and very cautious. His/her potential deviousness will be very hard to detect—especially during an interview. In time, other people usually catch on to the person’s manipulative tendencies, and that may cause problems for the person’s career. But in the short term, such people prosper in organizations.

Third, a more common pattern includes very high scores on the three “faking” HICs, and high scores on HPI Prudence, HDS Diligent and/or Dutiful, and high MVPI Tradition and/or Security. These people tend to be socially correct, but self-righteous, rule bound, perfectionistic, and risk-averse; they are scrupulous in their treatment of others and exacting in their expectations. Bosses and senior managers usually find these tendencies attractive, but as managers, such people alienate and disempower their subordinates. Finally, we sometimes see a pattern that combines high scores on the “faking” HICs with overall attractive results on the HPI, HDS, and MVPI. Such people have superb social skills and few if any developmental needs. Their only issue is one of generating resentment in others because they seem too good to be true.

The major point of this section concerns how to interpret high scores on the HPI HICs designed to detect “faking”. We think the interpretation is substantive—such scores predict a particular interpersonal style. It is a style marked by extreme social appropriateness, good manners, and ultra-civility. Are people with these scores faking or are they engaged in a particular form of impression management?

ReferencesBartlett, F.C. (1937). On remembering. Cambridge, England: Cambridge University Press.

Hogan, R. (2006). Personality and the fate of organizations. Mahwah, NJ: Erlbaum.

Hogan, J., Barrett, P., & Hogan, R. (2007). Personality measurement, faking, and employment selection. Journal of Applied Psychology, 92, 1270-1285.