Using the Cassandra Bulk Loader
Cassandra, a distributed database management system, has introduced a new tool called sstableloader to improve bulk loading of data. This tool streams sstable data files to live clusters and only transfers relevant parts of the data to each node, conforming to the replication strategy of the cluster. The primary use cases for this tool include transferring data from one test cluster to another multi-node cluster and bulk-loading external data that is not in sstable form. To use sstableloader, users need to configure a cassandra.yaml file with correct settings and ensure that the schema for column families is defined beforehand. Additionally, users can create relevant sstables from CSV files using the SSTableSimpleUnsortedWriter class introduced in Cassandra 0.8.2.
Company
DataStax
Date published
Sept. 1, 2011
Author(s)
Sylvain Lebresne
Word count
1277
Hacker News points
None found.
Language
English