
Cassandra Data Loading: 8 Tips for Loading Data into Astra DB

What's this blog post about?

The DataStax Bulk Loader (dsbulk) is a command line tool for loading and unloading data from Apache Cassandra® and Astra DB. It helps to load, unload, count data from various databases including DataStax Astra cloud databases, DataStax Enterprise 4.7 and later databases, and open source Apache Cassandra® 2.1 and later databases. The dsbulk tool can be easily installed on a virtual machine (VM) in the same region as your database to decrease latency and increase throughput. It also works well with Astra DB by passing a Secure Connect Bundle, client id, and client secret. Performance tuning is crucial for optimizing bulk data loading process, which can be controlled using flags like --maxConcurrentQueries, --dsbulk.executor.maxPerSecond, and --dsbulk.executor.maxInFlight. Other tips include handling errors, dealing with rate limits, and onboarding engineers for additional help.


Date published
May 24, 2022

Sebastian Estevez

Word count

Hacker News points
None found.


By Matt Makai. 2021-2024.