- Speech to text
- Data Visualization
- Data Analytics
- Machine Learning
Our client is an European public railway company subsidiary in charge of managing the passenger stations of the national railway network. It provides essential services to the 10 million passengers and visitors who use stations of the country every day (safety, information, accessibility, cleanliness and comfort).
In these stations, passengers receive a large volume of information, including audible announcements, which must be as clear as possible.
The Data & Customer Platform team at of the company develops an innovative tool for analyzing passenger information, using Speech-To-Text (voice recognition) and neural networks to ensure that audio announcements broadcast in stations are understandable and contain all the information needed. Station managers responsible for these announcements can use this tool to visualize their quality and improve some aspects.
This project is Echo (“Ecoute à CHaud Opérationnelle” in French).
With this in mind, we were asked to build a complete solution that receives the sound announcements broadcast in the station, analyzes them, and exports data to be visualized in a dashboard.
Challenges
- We had to work with different data providers, who did not send data of the same type or quality. We had to build a processing chain that adapts to heterogeneous data and makes the most of medium or low quality sound files.
- The processing chain is made up of several successive stages using Machine Learning, so we had to use the most efficient models possible. These include a first speech recognition brick that produces text from audio files, then a neural network analyzes these texts to categorize them, and other models analyze the texts in a particular category to deduce new information. If speech recognition is not good, the whole chain will produce unusable information. Similarly, if the neural network provides categories that are not very precise, the final information will be drowned in noise.
- Machine Learning is a powerful tool, but complex to maintain. We have about ten models, ranging from a simple decision tree to a deep neural network, through a complete language processing analysis pipeline. We had to be very rigorous to maintain these models, train them and deploy them.
- The whole project is based on a cloud architecture, which is powerful and very flexible, but the slightest network error on a machine in the cloud can prevent us from reading or sending data, so we had to be persistent to write the most robust code possible.
Technologies
Audio announcements and their metadata are received in a data lake hosted in Azure.
Python code is launched hourly by Azure Data Factory to process the announcements when they arrive. This code is executed on Databricks clusters, and benefits from Spark technology to parallelize operations.
Speech recognition is performed by Azure’s Custom Speech service, that enables us to have a model trained specifically on our client’s data, which is more powerful than a standard model.
We then use a dozen Machine Learning models, ranging from decision trees to deep learning neural networks such as BERT, a neural network based on an architecture developed by Google. These models enable us to deduce a lot of information from the text of the sound announcement. Thus from the text of the announcement we can deduce whether this announcement concerns a normal situation, or a disturbed situation, and if so, whether this announcement indicates the cause of the problem and a workaround, as well as a time of return to normal.
The data produced is then exported to an Azure database, to be visualized in a dashboard made with Power BI.
Benefits
All this work has enabled us to analyze the quality and content of the sound announcements made in stations. This has resulted in decisive advantages for our client, which has been able to :
- Enable its station managers to better analyze the announcements made in their stations
- Save time and reactivity for station managers thanks to more detailed information and data enabling them to propose the sound announcements adapted to passengers’ expectations
- Accurate data to facilitate debriefing of crisis situations
- Have a better understanding of announcements involving less stress for users, who benefit from optimal comfort and customer experience in the station.
Testimonial
“ECHO is an attractive, efficient, fast and easy tool that will allow us to be much more reactive in the performance analysis.
At a glance, it enables us to have direct access to information where it was needed before listening to individual announcements.
It’s a real time-saver.”