DateTieredCompactionStrategy: Compaction for Time Series Data
DateTieredCompactionStrategy (DTCS) is a new compaction strategy introduced in Cassandra 2.0.11, designed for time series-like workloads. It aims to keep data written at the same time in the same SSTables to improve performance. DTCS was contributed by Björn Hegerfors at Spotify and is specifically tailored for handling time series data where data is mostly appended to existing partitions. The main purpose of compaction is to reclaim disk space used and to make sure that we can serve reads from as few SSTables as possible. DTCS groups SSTables in windows based on how old the data is in the SSTable, ensuring that new and old data are not mixed during compaction. The size of the compaction windows is configurable, with options like base_time_seconds and max_sstable_age_days to control the initial window size and when to stop compacting data, respectively. This strategy can greatly reduce the number of SSTables touched during a read for queries that request recent data, improving overall performance.
Company
DataStax
Date published
Nov. 19, 2014
Author(s)
Marcus Eriksson
Word count
1292
Hacker News points
None found.
Language
English