Company
Date Published
Author
Tobias Johansson and Petr Janouch
Word count
1637
Language
English
Hacker News points
None

Summary

With the release of Neo4j 4.0 comes a new feature called Neo4j Fabric, which allows issuing Cypher queries that target multiple Neo4j graph databases at once. This capability can be used for data federation and analysis across separate databases, horizontal scaling of data storage and processing, or different hybrid deployments. The operational principle of Neo4j Fabric is to store shards as separate and disjoint graphs, where relationships are modeled using proxy nodes and correlating id values. A sharded data model can improve query performance by reducing the number of "jumps" across shards in complex queries. The LDBC Social Network Benchmark dataset was used to demonstrate the differences between sharded and non-sharded configurations and to explore considerations for graph sharding. The results show that Neo4j Fabric achieves impressive performance gains for complex queries, both in read query latency and in total read query throughput, with a carefully designed manual sharding scheme.