The Chief Technology Officer of LegalForce Trademarkia was looking to improve the automatic classification of patent applications into 45 different primary categories and 40,000 sub-categories.
These categories are typically manually entered by attorneys, a very slow process, making it an ideal place for automation with Artificial Intelligence and NLP techniques.
Unfortunately, their existing system which used a linear classifier performed poorly, where patents were grossly misclassified at the primary category level and could not make a sensible classification at the sub-category level.
1. Understanding the problem
Our first step was to understand the exact solution used by our client, size of data involved and problems in their dataset such as sparsity issues.
Upon diagnosis, we realized that a traditional classification approach was not the best way to tackle the problem due to issues in the data and the massive number of sub-categories.
2. Solution Development
Once we determined the problem, we were able to design and develop an alternative solution leveraging a highly efficient information retrieval approach using Python, Gensim and ElasticSearch.
During development, we pre-processed the data adequately and developed a full pipeline (client IP) where any valid user input would result in a logical categorization both at the primary category and sub-category levels.
3. Evaluation & Delivery
To ensure that the results made sense and met the needs of our client, we manually evaluated ~50 test points.
In addition, we created a larger test set for quantitative evaluation on a broader scale and to find the best settings for the classification.
Finally, the full solution was delivered to our client for integration.
Kavita is very knowledgeable in data mining, proposing different options and helping us find out the appropriate approach to our unique problem. I enjoy working with her, and highly recommend her.
As a result of our partnership, LegalForce Trademarkia was able to see ~30% improvement of classification accuracy at the primary category level and was even able to make automatic classification at the sub-category level which they previously were not able to.
Also, because our algorithm was clear and efficient, our client was able to easily integrate the pipeline into their workflow.