Our Latest Articles

What are N-Grams?

N-Grams are a set of co-occurring words within a given window. When computing n-grams you typically move one word forward (although you can move X words forward in more advanced scenarios)…

What are Stop Words?

Stop words are a set of commonly used words in a language. Examples of stop words in English are “a”, “the”, “is”, “are” and etc. Stop words are commonly used in Text Mining and Natural Language Processing (NLP) to eliminate words that are so commonly used that they carry very little useful information. 

What is Inverse Document Frequency?

Inverse Document Frequency (IDF) is a weight indicating how commonly a word is used. The more frequent its usage across documents, the lower its score. The lower the score, the less important the word becomes.

What is Term Frequency?

Term frequency (TF) often used in Text Mining, NLP and Information Retrieval tells you how frequently a term occurs in a document. In the context natural language, terms correspond to words or phrases. Since every document is different in length, it is possible that a term would appear more often in longer documents than shorter ones. Thus, term frequency is often divided by the  the total number of terms in the document as a way of normalization.

How we use Natural Language Processing for market research?

In order for companies to innovate, build new product lines and understand the effects of certain product groups and chemical interactions, manual analysis of scientific articles is typically needed. Manual research can be very time consuming and researchers have started turning to automated methods with Natural Language Processing (NLP) and Artificial Intelligence (A.I.) to help speed up their work.

How we automatically organize large amounts of text data with topics?

Making sense of volumes of text data in surveys, legal documents, websites, customer support tickets and discussion threads can be daunting. This is why organizations are turning to tags, labels and topics to help organize all of their data. Unfortunately, not all organizations can afford the time to manually create labels for each and every document that they deal with…

How we make sense of emails, social media content and documents with automatic categorization?

Enterprises are overwhelmed with the volume of text they have to deal with every day. You have emails, chats, web pages, social media, support tickets, survey responses, clinical notes, incident reports and a whole lot more that are purely unstructured in nature. While text data can be an extremely rich source of information, manually extracting insights from large volumes of text data is labor intensive…

How we surface customer complaints, wants and needs through text analysis?

Feedback from customers trickles in from different sources including social sources (e.g. Twitter), customer surveys, user reviews and customer support conversations. All this data put together is a gold mine for understanding what customers REALLY want. Unfortunately, due to the complexity of such data, it is hard for organizations to gather insights about customers using that data…

3 Tips for Building NLP Systems that Scale

A vast majority of NLP solutions developed at the work place just don’t scale! And by scale, we mean handling real world uses cases,  ability to handle large amounts of data and ease of deployment in a production environment. Some of these approaches either work on extremely narrow use cases or have a tough time…
Read more