/plushcap/analysis/datastax/analyzing-cassandra-data-using-gpus-part-1

Analyzing Cassandra Data using GPUs, Part 1

What's this blog post about?

This article discusses an innovative approach for processing Apache Cassandra's high-speed transactional data using tools from the RAPIDS ecosystem, which allows users to obtain analytical insights faster and more efficiently. The RAPIDS project is a suite of open source libraries designed for analytics and data science end-to-end on GPUs. It leverages common AI/ML APIs like pandas and scikit-learn and makes them available for GPU acceleration. The article explores five different approaches to make Cassandra's SSTable files available for analysis with RAPIDS, including fetching data using the Cassandra driver, converting it into a pandas DataFrame, and turning it into a cuDF. It also discusses reading SSTables from disk using Cassandra server code, serializing it using Arrow IPC stream format, and sending it to the client. The article concludes by presenting results of performance tests against datasets with varying numbers of rows and announces an open source project called sstable-to-arrow that converts data to Apache Arrow for use in analytics libraries including NVIDIA's GPU powered RAPIDS ecosystem.

Company
DataStax

Date published
July 28, 2021

Author(s)
Alex Cai

Word count
1342

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.