Customized AI Evaluation Strategy in E-Commerce

1. Problem

A large US-based e-commerce company relies on its search engine for a large majority of its product sales. As a result, ensuring that the search algorithms perform optimally is a top priority.

To this end, the company’s software team developed a series of machine learning (ML) models to improve the search quality for specific product categories.

Unfortunately, after the deployment of the models, the company was stuck. They did not have a good sense of how to quantitatively evaluate the ML models or determine if the models were fit for deployment. All of this was done through gut feel.

Further, the company wasn’t fully sure what metrics to track to assess the lift in sales after the deployment of these models or how to go about it.

2. SOLUTION

Given the complexity of the problem, our first step in helping this company was to get to the foundations. In collaboration with the company, we wanted first to understand:

  1. The metrics that are currently being tracked
  2. The process around the company’s ML model development, evaluation, and deployment
  3. How the ML models fit into the search workflow
  4. How the company tracks the impact on sales

Once we dug deep and had a complete understanding of the issues and overall workflow, we came up with a plan to turn things around.

First, we shortlisted a set of key metrics to track as it relates to search and performance around sales.

We asked that the company start tracking these metrics and provided guidance on how to measure the lift on key metrics of interest with and without the ML models in operation.

Next, we developed a robust workflow to ensure that the ML models can be properly evaluated and are deployed only if they were deemed fit for deployment.

Specifically:

  1. We developed a method to use production activity signals as evaluation data.
  2. We provided a suite of metrics that the company could use to evaluate their ML models in conjunction with their new evaluation data.
  3. We provided an automated decision-making workflow to determine if a model is fit for deployment.

3. RESULTs

Because of our partnership, the company now has a better understanding of the metrics that they should be tracking to evaluate the performance of their e-commerce search and the impact of their ML models on sales-related metrics.

In addition, with our model evaluation workflow, the company now has a way to automatically determine if a model is fit for deployment by leveraging its very own production activity data.

Finally, the metrics, evaluation dataset, and evaluation workflow in combination allow the company to track potential model degradation.

Scroll to Top
Scroll to Top