Company
Date Published
Author
Erick Ramirez
Word count
1385
Language
English
Hacker News points
None

Summary

### Storage-attached indexing is being introduced in Apache Cassandra 5.0, allowing for more flexible and performant query patterns with less coding required. DataStax has been working on this feature, SAI (Storage-Attached Indexing), for several years and it's now deployed in their Cassandra-as-a-service Astra DB, showing high reliability and performance. SAI is better than other indexing methods available for CQL, providing more functionality at a fraction of the storage footprint, using less disk space than Solr implementation, and having significantly better throughput and lower latency compared to 2i and Solr. SAI indexes can be created on any column in a table except partition key columns, allowing queries on any node in any data center without the need for separate DCs or resources. SAI is built on Apache Lucene so it works with built-in analyzers to extract index terms from text like Solr, including standard and simple tokenizers. SAI enables semantic search with natural language processing (NLP) and machine learning algorithms to provide more accurate results. It also supports vector search which enables generative AI capabilities by extracting the essence of unstructured data using LLMs and generating vector embeddings. This allows for quick location of top N rows in a table that are most similar to a user query, all this can be done with just five lines of code.