The amount of text data we send out in the world is staggering. On average, there are 500 million tweets sent per day, 23 billion text messages, and 306.4 billion emails. Everything we say, every email we send, and every word on our resumes can be used to not only understand the world around us, but as clues about the individual speaking, typing, and writing.
Unfortunately, text data does not fit into the traditional structured format of rows and columns. Text data is messy, unstructured, and not easily analyzed using classical statistical methods. Enter natural language processing, or NLP. NLP is a type of artificial intelligence that uses machine learning to break down, process, and quantify human language. NLP helps us understand the hidden stories within text-based data.
There is no singular method associated with NLP. NLP consists of multiple techniques ranging from using keywords to interpret text or speech to understanding the underlying meaning and context of communication. Because of the varying techniques associated with NLP, in the IO literature, NLP has been used to aid with several business initiatives, such as job analysis and selection, to name just two.
Up to 95% of usable organizational data is unstructured, resulting in an increased drive for using this data to remain competitive. The competition and consistent advancements in computational power, data access, and open-source research initiatives have led to the field of NLP to evolve and grow constantly. At Hogan, we are leveraging this continual growth by using NLP to improve our products and talent analytics solutions.
Hogan and Natural Language Processing
One way we are using NLP is by streamlining the coding process of focus-group notes for personality scale relevance, thereby injecting NLP into our job-analysis strategy to increase the efficiency of the approach and improve the quality of our results. Manually reading and coding focus-group notes is a time-intensive and cognitively draining process. Using NLP, on average, we can decrease the overall time it takes by approximately 6,000% while maintaining predictions that are both consistent and accurate.
Many text-based, data-analytic tasks require similar knowledge about language, such as semantics, structural similarities, and syntax. This knowledge can be shared from one model to another through transfer learning. Transfer learning allows us to quickly take advantage of cutting-edge NLP research without having to spend months and years gathering unneeded data and training similar models from scratch. Transfer learning involves taking a model trained on another dataset for a different task and fine-tuning it on a second dataset for a different task. In other words, we take what the model learned already and adapt it for our purposes. The base model for focus-group note prediction was trained on over 3 billion words. The base model was fine-tuned on a large collection of focus group notes collected across hundreds of organizations where researchers identified which personality scales were relevant based on their expert judgment.
This approach has already shown promising results for correctly identifying the relevance of personality characteristics from focus-group notes. When compared against human-raters (subject matter experts; SMEs), our model was consistent and had an average accuracy score approximately 10% higher than the average accuracy of the SMEs. This indicates that Natural Language Processing is an accurate and efficient method for identifying the critical personality characteristics of job roles from focus groups.