/plushcap/analysis/datastax/datastax-dse-51-automatic-optimization-spark-sql-queries-using-dse-search

DSE 5.1: Automatic Optimization of Spark SQL Queries Using DSE Search

What's this blog post about?

Apache Solr-based DSE Search and Apache Spark-based DSE Analytics can be combined to enhance indexing capabilities in DataStax Enterprise (DSE) 5.1. This integration allows for improved performance in certain scenarios, such as count queries and filtering result sets. By enabling the spark.sql.dse.solr.enable_optimization configuration option, DSE Search can transform Catalyst predicates into Solr query clauses, optimizing analytics queries like "SELECT COUNT(*) where Column > 5" to be executed in near-real time. The performance benefits of using DSE Search are significant for count queries and filtering result sets, especially when retrieving a small portion of the total dataset. However, it is essential to note that these optimizations may not always be beneficial depending on data layout and hardware configurations.

Company
DataStax

Date published
May 9, 2017

Author(s)
Russell Spitzer

Word count
908

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.