Tuning the row cache in Cassandra 2.1
Cassandra performs optimally when the required data is already in memory as disk operations are relatively slow. To design an effective data model in Cassandra, it's crucial to consider best practices such as writing rows to disk in the same order they will be read and utilizing PRIMARY KEY for ordering. In the example provided, a table designed for holding time-series status updates is created with a carefully designed primary key that ensures rows are stored on disk in reverse chronological order according to the status_id. This enables efficient retrieval of the last 10 status updates for a user. Cassandra's row caching ability can be utilized by enabling it and specifying the number of rows to cache per partition. To use the row cache, you must also instruct Cassandra how much memory you wish to dedicate to the cache using the row_cache_size_in_mb setting in the cassandra.yaml config file. To test if data is truly being retrieved from the cache rather than from disk, tracing can be enabled in cqlsh. The trace will indicate whether a disk read was necessary or not. If the cache is insufficient to complete the request, a disk read may be necessary, which can be mitigated by increasing the cache size limit or restructuring the table to place frequently accessed rows at the head of the partition. By studying your application's query model and tuning it according to these best practices, you can achieve great response times without needing an external caching layer.
Company
DataStax
Date published
May 16, 2014
Author(s)
Ryan McGuire
Word count
784
Hacker News points
None found.
Language
English