Key challenges
For many organizations, data leakage is one of the main risks they are confronted with, if not the most serious.
For financial institutions, however, concerns around data leakage take on a whole new dimension. In addition to being barred from the market, breaches of data privacy or consumer protection laws can result in legal fees, lawsuits and damage a company’s reputation and long-term financial health.
Financial regulators therefore rely on numerous regulations to protect and secure end-user banking data.
In the Swiss context, the accredited regulator “FINMA” has raised the risk posed by the amount of sensitive data present in non-production environments that can be called “first-level” environments: sandbox, development, integration, etc.
The latter are often less secure than environments closer to production because they are more conducive to innovation, which, by definition, one does not want to constrain.
However, if a single piece of data does not pose a major risk, a mass of exposed data can raise operational risks for a bank in the event of potential data leaks. In order to reduce the exposure of banks to this risk, the regulator therefore wants to limit the number of exposed direct or indirect Client Identifying Data (CID).
In the context of our client, these environments were already secured at the level of direct CID (personal data, addresses, etc.). The mission consisted in anonymizing “indirect CID” such as references to sensitive data, or toxic combinations.
In order to meet these requirements, the project was structured in two phases: a first phase of analysis aimed at assessing the situation, finding answers to the issues raised and conceptualizing a solution, and a second phase of implementation of the chosen solution.
Our approach
Project requirements
A certain number of business, operational and technical constraints have been identified:
1. Reliability and durability
Data users must be able to rely on its durability, otherwise each change can become impactful for the user. If changes are frequent, the operational impact can be exponential. However, an anonymization mechanism, whatever it may be, modifies data by definition. The security of the environments must therefore ensure reliable protection over the long term.
2. Scalability
On the other hand, as these are “first-level” environments, they are bound to evolve over time. Data that is anonymized at a given moment because it is identified as being part of a given perimeter may be “in clear text” the next moment following a change in the infrastructure. The solution therefore had to be scalable and take into account architecture changes and developments in the application assets.
3. Performance
Finally, since the customer is operational 24 hours a day, the downtime of the environments, including non-production environments, had to be close to zero. This project was not only a challenge in terms of performance in the case of securing a large amount of banking data, but also required taking into account the customer’s existing processes in order to integrate them and minimize the operational impact of the project.
Data identification phase
We proposed several solutions to our client.
First, the analysis phase allowed us to identify the data to be anonymized, their location, and their degree of sensitivity.
A solution was proposed to answer these questions. It allows an identification similar to the so-called “DLP” (Data Loss Prevention) tools which will allow two types of searches:
- Search by “exact match”, in which case a list of “searched character strings” is established and searched in the client data.
- Search by “regular expression”, which allows intelligent searches based on specific criteria.
Our solution proposed to use these mechanisms to identify very precisely the data to be secured and to extract their technical metadata. This step would then allow us to analyze the data and use the metadata to perform the technical security operations.
Based on the prerequisites stated above, which we have technically evaluated, we have established a list of requirements:
- Estimated validity times of the protection mechanism,
- The preferable mechanism according to each given type and its sensitivity,
- The necessary functional architecture,
- The existing infrastructure required for the implementation of the project,
- The organization required for the implementation and the life cycle of the solution.
Thanks to this, and to a market analysis conducted in parallel, we were able to project and propose a set of solutions to the client. Some used internal mechanisms, others came from editors on the market. The proposed solutions allowed us to offer a security program for non-production environments from A to Z.
Data anonymization phase
At this stage, a challenge was identified regarding the project’s production launch. Indeed, given that the integrity of an environment was a strong criterion and that the scope was complete and could not be reduced, that the amount of banking data was consequent and that its anonymization takes time, and moreover, that the client did not want any interruption of service in its activities even outside of production, it was necessary to find a way to integrate with existing processes and to parallelize the anonymization actions while maintaining consistency in order to meet all requirements. This was as much a technical challenge as an organizational one.
The proposed solutions all have more or less the same architecture. Using the previously stored metadata, and the results of an analysis on the nature of each data, they will be able to use the following mechanisms to protect the data:
- Anonymization: transformation of the data to a random string of characters,
- Pseudonymization: transformation of the data into a string of characters known only to the people using the data,
- Tokenization: transformation of the data for a random token,
- Set: transformation of the data for a value of a known set. For example, dates of birth between 1950 and 1990, invoice amounts between two values not exceeding a maximum, or the use of an address set allowing statistics by country but not the identification of a data.
- Noise: The addition of data to an environment in order to mask the real values thus drowned in a larger flow.
Some solutions on the market allow hundreds of other security options, and allow native evolutions, while solutions modeled for the client allow an integration adapted to certain contexts. In this case, the need for a customizable solution stemmed from the fact that some identifiers had a functional specificity that required a dedicated security mechanism.
Benefits
Thanks to the implemented solution, our client is now able to meet its legal obligations towards its local regulator, but other much more operational benefits have also emerged from this project.
Indeed, the assurance of a totally anonymized and secured environment allows to imagine new development ideas and the customer is now able to propose access to its non-production environments to partners all over the world while assuring the local regulator that no data goes out of the territory, an essential project requirement.
In addition, the customer can now deploy its non-production environments on cloud platforms without compromising its operational requirements and therefore benefit from the modularity, flexibility, budget and scalability advantages of these platforms, and consequently, be more responsive and competitive on the services it offers.
While meeting its compliance requirements, this project allowed our client to improve its operational processes and above all to acquire a degree of flexibility and responsiveness that it did not have before.