113 Industries, a Market Research Firm was looking to enhance their Text Analytics capabilities on Voice of Customer (VoC) data so that they can perform Consumer Behavior Modeling research.
As part of this effort, one of the problems they had been facing is the ability to uncover and extract similar and related concepts across different industries. For example, if a grocery store like `safeway` was a topic they were analyzing, they needed a list of similar grocery stores or concepts related to grocery stores for their analysis.
While it’s easy for a human to enumerate related concepts for a handful of topics, scaling this up to hundreds of different topics as well as industries requires Natural Language Processing and Text Analytics automation. Otherwise, this process would be highly time and labor intensive — which is something they wanted to avoid.
1. Understanding the problem
Our first step was to get an idea of types of concepts 113 was trying to uncover and the industries they were looking to serve. We manually went over a series of topics and corresponding related concepts across several industries, to better grasp their requirements.
We also investigated the availability of data for each industry as that would determine the approach we can use to solve the problem with accuracy, efficiency and scalability in mind.
2. Solution Development
Once we had a clear definition of the problem, we designed and developed an unsupervised pipeline that leverages distributional semantics. The idea here is that concepts with similar distributions would have similar meanings.
We used technologies such as Python, Scikit-Learn and Gensim in developing this tool. The approach leverages large amounts of data available in every industry making it highly flexible, in that it does not require labeled training data.
3. Evaluation & Delivery
To ensure that the results made sense, we internally performed evaluation across industries. Our client also performed their own evaluation to vet the quality of the related concepts.
Finally, we documented our process to help with integration into their workflow and provided hands-on training on:
- How to retrain the tool on new data
- How to use the tool effectively
- How to deal with data sparsity
Related Concepts - Samples
As a result of our partnership, 113 Industries is now able to extract relevant concepts in different industries such as food and beverage as well as healthcare avoiding hours of manual labor they otherwise would have had to put in.
Also, because the tool that we developed was fully unsupervised, they were able to save costs and manual labor from avoiding the task of obtaining labeled training data.
Dr. Ganesan’s work and output is of very high quality. Her communication skills are excellent and she is able to explain and support her work and recommendations very well. I would highly recommend Dr. Ganesan.