
The machine learning solution accounts for matching of more than 30% of offers that would otherwise have to be matched manually.
The precision of the matched offers is higher than 98%
The developed solution runs real-time, processing dozens of millions of offers each day
"Thanks to the DataSentics AI strike team we were able to automate the matching of millions of offers coming from tens of thousands of e-shops and therefore save significant time spent previously on manual matching"
The DataSentics strike team operation
DataSentics strike team was working closely with the internal team from the beginning, starting from understanding the business aspect of the problem, the underlying data and processes. An architecture connecting the existing matching process with the new ML matching was created and validated with the internal team and the product owner. The technologies used were selected for the specific use case from existing possibilities, considering both the existing tech stack and the requirements. Some of the included parts changed with time, as the solution was rolled out to more categories and further improvements in terms of performance and speed were possible.
During development and productionization, the strike team kept a day-to-day contact with the internal team to understand the data more deeply, solve the technical challenges and to validate the individual parts of the solution. Internal product owner was part of the process to discuss the business aspect of the problem and the arising questions, to ensure the business impact, and to keep connection with other internal teams and departments. Also, the product owner, the strike team, and the internal team were together creating and formulating goals for the following quarters.

Product matching relied heavily on manual work
The task of matching offers coming from e-shops to the products in the internal catalogue was performed by an army of manual workers. The existing automation was able to only match offers to products based on few rules, like identical name or ISBN code.
The task of the AI strike team combining both client’s and our ML engineers was to create a robust machine learning powered solution and to deploy it to production.
Complex architecture running in near-real time
The solution uses a multi-staged architecture to find the corresponding product for each incoming offer.
- Elasticsearch is used to select several candidate products based on name only.
- An XGBoost model is used to compare the offer with each of the candidates. The model uses a set of features including several name similarity measures, comparison of price, and various attributes like weight, size or color.
- The resulting decisions for each pair offer-candidate are gathered, additional business rules might be applied, and the final decision is made if the corresponding product was identified unambiguously.

- The model is served using FastAPI, while using MLflow registry.
- All parts of the solution are running on on-premises infrastructure, orchestrated by Kubernetes.
- The training and deployment are automated using GitLab CICD pipelines.
- The results are monitored using Prometheus and Grafana.
The impact of automated matching solution
- The developed solution runs real-time, processing dozens of millions of offers each day.
- Offers in all categories (except for Fashion) are being matched.
- The precision of the matched offers is higher than 98%.
- The machine learning solution accounts for matching of more than 30% of offers that would otherwise have to be matched manually.
- As the amount of work for content editors decreases, they enjoy more time for work on more creative tasks with higher added value for heureka! group.