Home / Companies / DataStax / Blog / Post Details
Content Deep Dive

Cassandra Data Loading: 8 Tips for Loading Data into Astra DB

Blog post from DataStax

Post Details
Company
Date Published
Author
Sebastian Estevez
Word Count
1,142
Language
English
Hacker News Points
-
Summary

The DataStax Bulk Loader (dsbulk) is a command line tool for loading and unloading data from Apache Cassandra® and Astra DB. It helps to load, unload, count data from various databases including DataStax Astra cloud databases, DataStax Enterprise 4.7 and later databases, and open source Apache Cassandra® 2.1 and later databases. The dsbulk tool can be easily installed on a virtual machine (VM) in the same region as your database to decrease latency and increase throughput. It also works well with Astra DB by passing a Secure Connect Bundle, client id, and client secret. Performance tuning is crucial for optimizing bulk data loading process, which can be controlled using flags like --maxConcurrentQueries, --dsbulk.executor.maxPerSecond, and --dsbulk.executor.maxInFlight. Other tips include handling errors, dealing with rate limits, and onboarding engineers for additional help.