What is Inverse Document Frequency?

Inverse Document Frequency (IDF)

Inverse Document Frequency (IDF) is a weight indicating how commonly a word is used. The more frequent its usage across documents, the lower its score. The lower the score, the less important the word becomes.

For example, the word the appears in almost all English texts and would thus have a very low IDF score as it carries very little “topic” information.  In contrast, if you take the word coffee, while it is common, it’s not used as widely as the word the . Thus, coffee would have a higher IDF score than the. Traditionally IDF is computed as:

where N is the total number of documents in your text collection and DFt is the number of documents containing the term t and t is any word in your vocabulary.

IDF is typically used to boost the scores of words that are unique to a document with the hope that you surface high information words that characterize your document and suppress words that don’t carry much weight in a document.

For example, in any given document, if the word the appeared 10 times and its IDF weight is 0.1, its resulting score would be 1 (since 10*0.1=1). Now if the word coffee also appeared 10 times and its IDF weight is 0.5 the resulting score would be 5. When you rank the words by the resulting scores (in descending order of course!), coffee would appear before the, indicating that coffee is more important than the word the.

Keep Learning & Succeed With AI

  • JOIN OUR NEWSLETTER, AI Integrated, which teaches you how to successfully integrate AI into your business to attain growth and profitability for years to come.
  • GET 3 FREE CHAPTERS of our book, The Business Case for AI, to learn practical AI applications, immediately usable strategies, and best practices to be successful with AI. Available as: audiobook, print, and eBook.
  • GET A 1:1 INITIAL CONSULT to learn how to move your AI initiatives forward, develop a strategic roadmap, educate leaders, and more. Use strategies you could apply immediately.

Not Sure Where AI Can Be Used in Your Business? Start With Our Bestseller.

The Business Case for AI: A Leader’s Guide to AI Strategies, Best Practices & Real-World Applications. By: Founder, Kavita Ganesan

In this practical guide for business leaders, Kavita Ganesan, our CEO, takes the mystery out of implementing AI, showing you how to launch AI initiatives that get results. With real-world AI examples to spark your own ideas, you’ll learn how to identify high-impact AI opportunities, prepare for AI transitions, and measure your AI performance.

Scroll to Top