Company
Date Published
March 31, 2022
Author
Alex Watson
Word count
451
Language
English
Hacker News points
None

Summary

Researchers from Gretel.ai and Illumina's Emerging Solutions have successfully created synthetic versions of real-world genomic data sets using state-of-the-art generative neural networks. The synthetic datasets offer enhanced privacy guarantees, enabling life science researchers to collaborate and test ideas through open access to data without compromising patient privacy. While the initial case study results are based on a small sample set, continued experiments in scale, accuracy, and privacy show that synthetic data has the potential to enable sharing and collaboration on synthetic genomics datasets at a much larger scale than currently possible. The code for synthesizing genomic data is available on GitHub, and further research will explore the scale and privacy guarantees achievable with synthetic data on genomic datasets.