Creating Synthetic Time Series Data for Global Financial Institutions â a POC Deep Dive
This study discusses the creation of high-quality synthetic time-series datasets for one of the largest financial institutions in the world using Gretel's methods. The temporal nature of time series data is useful for tracking and forecasting trends, but sharing such data between individuals and organizations can be challenging due to privacy concerns. By generating synthetic time-series data that are generalizable and shareable amongst diverse teams, financial institutions can gain a competitive edge and explore new opportunities. The bank's data science team provided a time series dataset containing customer account balance information over time. A pipeline was created to de-identify the time series dataset and then create a synthetic model that would generate an artificial dataset of the same size and shape. The accuracy of the data was assessed using a comparison of time series distributions for a district in the dataset, and the quality of the synthetic time series dataset was evaluated by fitting an ARIMA model to both the synthetic and original datasets. The privacy of the artificial data was then assessed by comparing the transformed and synthesized dataset to the original training dataset. Gretel's Similarity Privacy Filter removed all synthetic records that were duplicates of training records, providing strong privacy guarantees required to allow sharing inside a financial institution. The study demonstrated that Gretel's synthetic data can be as accurate, and in some cases even surpass that of real-world data used for machine learning classification tasks while maintaining high standards of privacy.
Company
Gretel.ai
Date published
Jan. 30, 2022
Author(s)
Alex Watson
Word count
1351
Language
English
Hacker News points
3