Company
Date Published
Author
Ben Elsworth, Marina Vabistsevits, Oliver Lloyd, Yi Liu & Tom Gaunt
Word count
1140
Language
English
Hacker News points
None

Summary

We have designed a Neo4j data integration pipeline to streamline our projects, providing access and transparency to the entire process. The pipeline uses Snakemake rules to control each step of the build process, running checks on each dataset and automating the build process. It can create a working graph from raw data, while also handling datasets from various sources that require cleaning and QC before incorporation. The pipeline also includes features such as predefined database schema creation, testing new data, merging nodes, Neo4j import, remote server options, and setup instructions for use. Our goal is to provide a simple method for adding new data to a graph build, which could potentially be used collaboratively.