Reconciling DSE with Source System Using Apache Spark and Apache Solr

Company

DataStax

Date Published

Dec. 4, 2018

Author

Caroline George

Word count

381

Language

English

Hacker News points

None

URL

www.datastax.com/blog/reconciling-dse-source-system-using-apache-spark-and-apache-solr

Summary

As a Solutions Engineer at DataStax, the question of how to confirm correct data loading in a DataStax Enterprise (DSE) cluster is frequently asked. This is particularly crucial when data governance is important or DSE becomes the System of Record. Traditional databases have various methods for reconciling data between environments, but this can be more challenging with Apache Cassandra due to its distributed nature. However, there are options available: 1. AlwaysOn SQL in DSE 6.0 allows for highly-available and secure SQL service execution directly in Studio. 2. SEARCH INDEX creation enables validation of date fields and checking for blank fields. 3. DSE Search CQL Sum and Cassandra Count can be used to return the total number of rows, validate date fields, and check field values. 4. DSE Analytics with Apache Spark integration is recommended for identifying discrepancies between data sources. 5. Monitoring application logs and system.log files on each node, as well as using OpsCenter or nodetool tablestats, can help detect any issues impacting reconciliation queries.