DSE 5.1: Automatic Optimization of Spark SQL Queries Using DSE Search
Apache Solr-based DSE Search and Apache Spark-based DSE Analytics can be combined to enhance indexing capabilities in DataStax Enterprise (DSE) 5.1. This integration allows for improved performance in certain scenarios, such as count queries and filtering result sets. By enabling the spark.sql.dse.solr.enable_optimization configuration option, DSE Search can transform Catalyst predicates into Solr query clauses, optimizing analytics queries like "SELECT COUNT(*) where Column > 5" to be executed in near-real time. The performance benefits of using DSE Search are significant for count queries and filtering result sets, especially when retrieving a small portion of the total dataset. However, it is essential to note that these optimizations may not always be beneficial depending on data layout and hardware configurations.
Company
DataStax
Date published
May 9, 2017
Author(s)
Russell Spitzer
Word count
908
Hacker News points
None found.
Language
English