Company
Date Published
Author
-
Word count
976
Language
English
Hacker News points
None

Summary

In this post, we'll showcase advanced Storage-attached indexing (SAI) capabilities in Apache Cassandra 5.0 and demonstrate how to convert Solr schema fields to corresponding SAI index options, particularly with index analyzers, tokenizers, and filters. We're using a notional application that stores information about movies and uses Solr for search. The movie data is partitioned by the movie ID, and we create a regular SAI index on the title column. However, due to the StandardTokenizer and LowerCaseFilter in the Solr schema, this results in non-exact matches when searching for the term "Extraction". To overcome this, we use the built-in STANDARD analyzer in Cassandra to split the text into words and convert them into lowercase. This allows us to get hits for movies with more than just the search term "Extraction" in the title. We can also use other generic analyzers like simple, whitespace, stop, and lowercase, as well as specific analyzers for over 30 languages. Additionally, we can add a stemming filter like PorterStemFilter to normalize terms into their base words. SAI supports the use of CONTAINS and CONTAINS KEY operators when querying collections, allowing us to search for action movies by genre or Chris Hemsworth in movies.