/plushcap/analysis/datastax/datastax-user-defined-aggregations-spark-dse-50

User Defined Aggregations with Spark in DSE 5.0

What's this blog post about?

Apache Cassandra introduced User-Defined Functions (UDF) and User-Defined Aggregates (UDA) in version 2.2, allowing users to write their own scalar functions and build custom aggregations. UDFs are executed on the coordinator node and can be used for partition-level aggregations. They should be written in Java and can be applied to one or more columns. Aggregations work on multiple rows and produce a single row result, requiring an initial state and optional final function. The sandbox introduced in DSE 5.0 ensures that UDFs do not perform malicious actions. It uses Java bytecode inspection and restricted class loading for protection. Analytics in Apache Spark uses resilient distributed datasets (RDD) with various operations, including map-reduce, grouping, aggregation, re-partitioning, and more. UDF and UDA are building blocks for something bigger and can be used from analytics code like Spark.

Company
DataStax

Date published
Aug. 1, 2016

Author(s)
Robert Stupp

Word count
2328

Language
English

Hacker News points
None found.