/plushcap/analysis/datastax/datastax-improving-jbod

Improving JBOD

What's this blog post about?

Cassandra 3.2 introduces improvements to handling JBOD (Just a Bunch of Disks) configuration, which involves using multiple data_file_directories. Previous versions had issues with running out of disk space during compactions and deleted data resurfacing. To address these problems, the new version ensures that each token is never written in more than one data directory by changing how compaction, flushing, and streaming are done. This also allows for splitting local ranges over data directories and balancing the amount of data in each directory. The resulting sstables will stay in the same compaction strategy instance if all tokens are in the correct place; otherwise, they will be moved to new sstables in the correct data directories. Flushing is now multi-threaded, with one thread per data directory, and streaming allows for backup and restoration of individual disks. Nodetool relocatesstables can help speed up token movement to the correct data directories when needed.

Company
DataStax

Date published
Jan. 11, 2016

Author(s)
Marcus Eriksson

Word count
1266

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.