Precision and recall are commonly used metrics to measure the performance of machine learning models or AI solutions in general. It helps understand how well models are making predictions.
Let’s use an email SPAM prediction example. Say you have a model that looks at an email and decides whether it’s SPAM or NOT SPAM. To see how well it’s doing, you want to compare it with human-generated labels, which we will call the actual labels.
To demonstrate this, the table below shows you some actual labels and the machine (model) predicted labels. Now we’ll assume that the spam prediction is positive, and the not spam prediction is negative.
|Email ID||Actual Label||Machine Predicted Label|
|Email 1||Spam (positive)||Spam (positive & correct)|
|Email 2||Spam (positive)||Not Spam (negative & incorrect)|
|Email 3||Not Spam (negative)||Spam (positive & incorrect)|
|Email 4||Spam (positive)||Not Spam (negative & incorrect)|
What is Precision in ML?
Given this, intuitively, precision measures the proportion of correct positive predictions.
As you can see from the table above, out of the 2 spam (positive) machine predictions, only 1 is correct. So the precision is 0.5 or 50%.
What is Recall in ML?
Recall measures the proportion of actual positive labels correctly identified by the model.
From the table above, notice that we have 3 actual labels that are positive, and out of that only one is correctly captured by the model. So the recall is 0.33 or 33%.
All in all, in the SPAM prediction example, precision is 50% and recall is 33%.
What Message Does Precision and Recall Convey?
What precision measures at a high level is correctness. What recall measures at a high level is coverage. For example, if precision is 98% it means that when the model says the prediction is positive, the prediction is likely accurate. A model can be overly conservative and only make limited positive predictions, resulting in high precision. In other words, it fails to make sufficient positive predictions. This is why you also need to consider recall—to ensure you’re capturing sufficient actual positives.
When it comes to recall, a high recall means that the model can capture most of the positive predictions. But if a model says everything is positive regardless of underlying reasoning, the recall will be artificially high and close to perfect. That’s why you need to balance between precision and recall. You want accurate predictions, but at the same time not at the cost of missing out on too many positive predictions (false negative predictions). Ideally, you want sufficiently high precision and recall.
In summary, precision measures the proportion of correct positive predictions, and recall measures the coverage of actual positive labels. For a model to be considered “good” both precision and recall must be at acceptable levels. In the end, what’s acceptable depends on the application.