Creation of a custom-made data analysis solution for SNCF Gares & Connexions

Creation of a custom-made data analysis solution for SNCF Gares & Connexions

SNCF Gares & Connexions is a subsidiary of SNCF Réseau in charge of managing the passenger stations of the French national railway network. It provides essential services to the 10 million passengers and visitors who use stations in France every day (safety, information, accessibility, cleanliness and comfort).

In these stations, passengers receive a large volume of information, including audible announcements, which must be as clear as possible.

The Data & Customer Platform team at Gares & Connexions develops an innovative tool for analyzing passenger information, using Speech-To-Text (voice recognition) and neural networks to ensure that audio announcements broadcast in stations are understandable and contain all the information needed. Station managers responsible for these announcements can use this tool to visualize their quality and improve some aspects.

This project is Echo (“Ecoute à CHaud Opérationnelle” in French).

With this in mind, we were asked to build a complete solution that receives the sound announcements broadcast in the station, analyzes them, and exports data to be visualized in a dashboard.

Challenges

Technologies

Audio announcements and their metadata are received in a data lake hosted in Azure.

Python code is launched hourly by Azure Data Factory to process the announcements when they arrive. This code is executed on Databricks clusters, and benefits from Spark technology to parallelize operations.

Speech recognition is performed by Azure’s Custom Speech service, that enables us to have a model trained specifically on SNCF data, which is more powerful than a standard model.

We then use a dozen Machine Learning models, ranging from decision trees to deep learning neural networks such as BERT, a neural network based on an architecture developed by Google. These models enable us to deduce a lot of information from the text of the sound announcement. Thus from the text of the announcement we can deduce whether this announcement concerns a normal situation, or a disturbed situation, and if so, whether this announcement indicates the cause of the problem and a workaround, as well as a time of return to normal.

The data produced is then exported to an Azure database, to be visualized in a dashboard made with Power BI.

Benefits

All this work has enabled us to analyze the quality and content of the sound announcements made in stations. This has resulted in decisive advantages for SNCF Gares & Connexions, which has been able to :

Testimonial

“ECHO is an attractive, efficient, fast and easy tool that will allow us to be much more reactive in the performance analysis.

At a glance, it enables us to have direct access to information where it was needed before listening to individual announcements.

It’s a real time-saver.”