Powers of Ten – Part II
This article discusses strategies for bulk loading data into Titan at varying scales, focusing on hundreds of millions and billions of edges using Faunus as the loading tool. It provides a step-by-step guide to loading the DocGraph dataset with approximately 1 million vertices and 154 million edges using a single Hadoop node running in pseudo-distributed mode. The article also demonstrates how to load the Friendster social network dataset, which represents a graph with 117 million vertices and 2.5 billion edges, using a four-node Hadoop cluster. It emphasizes that while there are common strategies for loading data at different scales, the actual approach must be adapted to the specific data and domain.
Company
DataStax
Date published
June 2, 2014
Author(s)
Stephen Mallette
Word count
2202
Language
English
Hacker News points
None found.