Improving Secondary Index Write Performance in 1.2
Secondary indexes in Cassandra, introduced in version 0.7, allow data access using attributes other than the row key. They use an auxiliary column family to model an inverted index of values from a primary column family. The SecondaryIndex interface is an extension point for alternative implementations. However, secondary indexes add complexity as they need to be kept in sync with primary data. In Cassandra 1.2, the read-before-write requirement was removed by writing new index entries at the same time as updating primary data and deleting old entries lazily at query time. This led to performance improvements. To ensure consistency between primary data and secondary indexes, a RowMutation is received, and if any columns being mutated are configured with secondary indexes, additional work is required. The solution involves pushing updates to secondary indexes down the stack and implementing read-repair for indexes. This resulted in an ~11% improvement in write throughput.
Company
DataStax
Date published
March 28, 2013
Author(s)
Sam Tunnicliffe
Word count
805
Hacker News points
None found.
Language
English