Audio announcements impact analysis for a European public railway company

Audio announcements impact analysis for a European public railway company

Our client is an European public railway company subsidiary in charge of managing the passenger stations of the national railway network. It provides essential services to the 10 million passengers and visitors who use stations of the country every day (safety, information, accessibility, cleanliness and comfort).

In these stations, passengers receive a large volume of information, including audible announcements, which must be as clear as possible.

The Data & Customer Platform team at of the company develops an innovative tool for analyzing passenger information, using Speech-To-Text (voice recognition) and neural networks to ensure that audio announcements broadcast in stations are understandable and contain all the information needed. Station managers responsible for these announcements can use this tool to visualize their quality and improve some aspects.

This project is Echo (“Ecoute à CHaud Opérationnelle” in French).

With this in mind, we were asked to build a complete solution that receives the sound announcements broadcast in the station, analyzes them, and exports data to be visualized in a dashboard.



Audio announcements and their metadata are received in a data lake hosted in Azure.

Python code is launched hourly by Azure Data Factory to process the announcements when they arrive. This code is executed on Databricks clusters, and benefits from Spark technology to parallelize operations.

Speech recognition is performed by Azure’s Custom Speech service, that enables us to have a model trained specifically on our client’s data, which is more powerful than a standard model.

We then use a dozen Machine Learning models, ranging from decision trees to deep learning neural networks such as BERT, a neural network based on an architecture developed by Google. These models enable us to deduce a lot of information from the text of the sound announcement. Thus from the text of the announcement we can deduce whether this announcement concerns a normal situation, or a disturbed situation, and if so, whether this announcement indicates the cause of the problem and a workaround, as well as a time of return to normal.

The data produced is then exported to an Azure database, to be visualized in a dashboard made with Power BI.


All this work has enabled us to analyze the quality and content of the sound announcements made in stations. This has resulted in decisive advantages for our client, which has been able to :


“ECHO is an attractive, efficient, fast and easy tool that will allow us to be much more reactive in the performance analysis.

At a glance, it enables us to have direct access to information where it was needed before listening to individual announcements.

It’s a real time-saver.”