User Defined Aggregations with Spark in DSE 5.0
Apache Cassandra introduced User-Defined Functions (UDF) and User-Defined Aggregates (UDA) in version 2.2, allowing users to write their own scalar functions and build custom aggregations. UDFs are executed on the coordinator node and can be used for partition-level aggregations. They should be written in Java and can be applied to one or more columns. Aggregations work on multiple rows and produce a single row result, requiring an initial state and optional final function. The sandbox introduced in DSE 5.0 ensures that UDFs do not perform malicious actions. It uses Java bytecode inspection and restricted class loading for protection. Analytics in Apache Spark uses resilient distributed datasets (RDD) with various operations, including map-reduce, grouping, aggregation, re-partitioning, and more. UDF and UDA are building blocks for something bigger and can be used from analytics code like Spark.
Company
DataStax
Date published
Aug. 1, 2016
Author(s)
Robert Stupp
Word count
2328
Language
English
Hacker News points
None found.