Company
Date Published
Author
Michael Hunger & Mark Needham
Word count
2556
Language
English
Hacker News points
None

Summary

Stack Overflow's Neo4j data is being imported into a graph database. The process involves using the Stack Exchange API to fetch data, converting it from JSON to CSV, and then importing it into Neo4j using the LOAD CSV tool. The data model was determined before import, and the data was filtered to exclude irrelevant information such as question bodies and comments. The LOAD CSV tool is used to create nodes and relationships in the graph database, with tips such as merging on a key, using constraints and indexes, and applying periodic commits to improve performance. Additionally, scripting import commands can be used to automate the process. A bulk data import tool was also utilized to ingest the CSV files into Neo4j, allowing for faster import times and improved disk I/O performance. The final product includes 30 million nodes, 78 million relationships, and 280 million properties in a Neo4j database.