DSE Continuous Paging Tuning and Support Guide
Continuous Paging (CP) is a new feature in Datastax Enterprise that optimizes streaming bulk amounts of records from Datastax to the Java Driver. It's an opt-in feature and can be enabled by setting "spark.dse.continuous_paging_enabled" to true as a Spark configuration option. CP increases read speed by having the server continuously prepare new result pages in response to a query, reducing communication cycles between the DSE Server and DSE Java Driver. However, it uses more Cassandra resources and may not be suitable for all use-cases. It's integrated into DSE Server and the DSECassandraConnectionFactory, so it can only be used with DSE. CP is automatically disabled if the target of the Spark Application is not DSE or is not CP capable. Errors related to Continuous Paging often manifest as tasks failing in the middle of a Spark job, which are immediately retried and usually succeed on a second attempt. These failures can cause jobs to take longer due to some tasks needing to be redone. The feature provides significant speed improvements over normal paging methods used by DSE, but only when reading from Cassandra is the bottleneck in the pipeline.
Company
DataStax
Date published
April 25, 2017
Author(s)
Russell Spitzer
Word count
1303
Hacker News points
None found.
Language
English