/plushcap/analysis/datastax/datastax-cassandra-data-loading

Cassandra Data Loading: 8 Tips for Loading Data into Astra DB

What's this blog post about?

The DataStax Bulk Loader (dsbulk) is a command line tool for loading and unloading data from Apache Cassandra® and Astra DB. It helps to load, unload, count data from various databases including DataStax Astra cloud databases, DataStax Enterprise 4.7 and later databases, and open source Apache Cassandra® 2.1 and later databases. The dsbulk tool can be easily installed on a virtual machine (VM) in the same region as your database to decrease latency and increase throughput. It also works well with Astra DB by passing a Secure Connect Bundle, client id, and client secret. Performance tuning is crucial for optimizing bulk data loading process, which can be controlled using flags like --maxConcurrentQueries, --dsbulk.executor.maxPerSecond, and --dsbulk.executor.maxInFlight. Other tips include handling errors, dealing with rate limits, and onboarding engineers for additional help.

Company
DataStax

Date published
May 24, 2022

Author(s)
Sebastian Estevez

Word count
1142

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.