Optimize personalization algorithms for a leading retail company

Optimize personalization algorithms for a leading retail company

Key challenges

Our client is a French group, European leader in the retail industry, pioneer of the hypermarket concept.

As part of its marketing strategy, direct mailing campaigns are sent every month to more than 4 million customers. These customers are identified through their loyalty cards. A part of this process consists of generating an optimal offer pool of 500 offers covering as many of the target group as possible and corresponding to the customers’ purchasing habits

With the aim of optimizing costs and relevance, our client was looking to replace the current personalization algorithms that were becoming unstable and required a lot of resources with a more optimized process.

Our approach

To meet our client’s needs, we first investigated to see if a graph database could be relevant and help optimize categorization. We started with a Proof of Concept, and after seeing very encouraging results in the test phase, we then designed and implemented Neo4i (the leading graph database product) to replace the Hadoop/Spark based systems.

As a second step, and in order to ingest the data (more than 100 million items to be processed), we developed an ETL in Scala. We redesigned the entire personalization process to be based on Neo4j and we generated a new pool of offers for each mailing.

To guarantee performance, we have carried out monthly tests to improve the personalization process (survival analysis, market basket analysis, clustering…) in PySpark. This work was done in close collaboration with the Marketing team in order to ensure relevant functional results.


Optimization of the relevance of the process for mailing and offer personalization, according to customers’ expectations and habits.

Increased efficiency by 2-3 times compared to the old system. Test were performed on the recommendation process on 100,000 customers with 2 years of historical data (>100 million transactions).

Increased efficiency by 2-3 times compared to the old system.

Technologies & Partners

Technologies used for this project: Hadoop, PySpark, Neo4j, Scala and Python