- Data Lake
- Cloud Infrastructure
- Car Manufacturing
The client is one of the largest automobile manufacturers in the world. We worked more specifically with the technical center in Zaventem, home to their European Research & Development department, aimed at enhancing the role of vehicle research and development. We were first consulted for setting up the R&D engineers’ application in the cloud using Infrastructure as Code. After successfully conducting this migration, we came to realize that the performance of the existing component querying the data could be improved. We offered our experience to advise the client on a better approach to handle its data by optimizing its data lake.
The R&D department wanted to collect and store the data from the fleet the R&D engineers’ moving cars for research purposes. 4G modems are installed in every car, sending data to the head office (like speed, steering angle, pressure, etc.) coming in from a multitude of sensors every two minutes. This resulted in a massive amount of unknown and untreated data from a large set of cars driven in real-world conditions.
The main challenge was to clean this data and make it easy to query so the engineers could actually use it for their R&D use cases (i.e: decreasing CO2 emissions, reducing fuel consumption). The engineers would then use this data to run pre-analysis and compare it to the data issued by prototype cars with test engines.
In this context, the client was facing several issues:
- Lack of data knowledge
- Every request was either over engineered or under engineered and overall not adapted to the data available
- The current infrastructure was unable to keep up with the new R&D use cases coming in every month
Our client was expecting us to find a better solution to improve the quality of the data, clean it, treat it and store it properly in a central data platform dedicated to R&D analysis.
We were responsible for the full solution cloud architecture, design and set up in AWS.
We created an initial architecture based on DataOps best practices, that we then adapted to meet our client’s requirements.
Our Data architects therefore performed the analysis, designed the architecture and implemented the project (including the data ingestion, the data lake, and the web administration).
The project was composed of several phases:
- 1. Understanding the client’s requirements and implementing the architecture based on these criteria. This resulted in an expensive and not scalable infrastructure, inappropriate for the quantity of data expected and for the needs of the R&D department.
- 2. Rebuilding a new and more optimized data lake architecture, building a PoC and getting it approved.
The data lake is an architecture in four zones (temporary, raw, secure and clean) where data flows from right to left. The data lake is hosted on S3 and indexed using Athena for a perfect balance between performance and cost optimization.
- Automating the infrastructure using Cloud Formation.
- Managing the integration with Bamboo (specific customer request).
- Improving of the data ingestion process: adapt the system to retrieve the data from the cars, transform it into a usable format for the analysts and create all the intermediate enrichment tasks needed to give context to the data.
By offering a combination of our expertise in Cloud and Data, we successfully reached the expectations of the client, providing an optimized data lake architecture. On top of this, we provided a solution that is:
- Cost-effective: the system adapts to peaks and quiet times of activity, so our client pays only for what he uses. Also, the costs from the previous client’s architecture to the new one were cut down by 6.
- Scalable: the system can scale up and down depending on the needs (i.e: adding 1000 more cars to the fleet).
- Flexible: the infrastructure can easily adapt to whatever new needs are coming in.
- Faster: the system is faster for the engineers to query.
- Resilient: changes like new cars models or integration of data coming from old cars data sets can be done easily.
Our infrastructure approach also involves all the DataOps practices like CI/CD and IaaC, which was not a standard for the client at the time.