Reconciling DSE with Source System Using Apache Spark and Apache Solr
As a Solutions Engineer at DataStax, the question of how to confirm correct data loading in a DataStax Enterprise (DSE) cluster is frequently asked. This is particularly crucial when data governance is important or DSE becomes the System of Record. Traditional databases have various methods for reconciling data between environments, but this can be more challenging with Apache Cassandra due to its distributed nature. However, there are options available: 1. AlwaysOn SQL in DSE 6.0 allows for highly-available and secure SQL service execution directly in Studio. 2. SEARCH INDEX creation enables validation of date fields and checking for blank fields. 3. DSE Search CQL Sum and Cassandra Count can be used to return the total number of rows, validate date fields, and check field values. 4. DSE Analytics with Apache Spark integration is recommended for identifying discrepancies between data sources. 5. Monitoring application logs and system.log files on each node, as well as using OpsCenter or nodetool tablestats, can help detect any issues impacting reconciliation queries.
Company
DataStax
Date published
Dec. 4, 2018
Author(s)
Caroline George
Word count
381
Hacker News points
None found.
Language
English