/plushcap/analysis/datastax/datastax-using-cassandra-bulk-loader

Using the Cassandra Bulk Loader

What's this blog post about?

Cassandra, a distributed database management system, has introduced a new tool called sstableloader to improve bulk loading of data. This tool streams sstable data files to live clusters and only transfers relevant parts of the data to each node, conforming to the replication strategy of the cluster. The primary use cases for this tool include transferring data from one test cluster to another multi-node cluster and bulk-loading external data that is not in sstable form. To use sstableloader, users need to configure a cassandra.yaml file with correct settings and ensure that the schema for column families is defined beforehand. Additionally, users can create relevant sstables from CSV files using the SSTableSimpleUnsortedWriter class introduced in Cassandra 0.8.2.

Company
DataStax

Date published
Sept. 1, 2011

Author(s)
Sylvain Lebresne

Word count
1277

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.