/plushcap/analysis/datastax/analyzing-cassandra-data-using-gpus-part-2

Analyzing Cassandra Data using GPUs, Part 2

What's this blog post about?

The text discusses the development of sstable-to-arrow, a C++17 program that uses Kaitai Struct library for parsing Cassandra's SSTable files into Arrow data format. It aims to enable GPU analytics on SSTable data by converting each column in the table into an Arrow Vector and shipping the data to clients where it can be converted into a cuDF for further analysis. The ultimate goal is to include a read_sstable function in RAPIDS ecosystem similar to cudf.DataFrame.from_csv. Performance improvements, broadening support for different CQL types, handling large datasets are areas of continuous development. sstable-to-arrow can be run using Docker and supports saving SSTable data as a Parquet file. It is available on GitHub and accessible via Docker Hub as an alpha release. A free online workshop with hands-on examples will be held in mid August for those interested in trying out the project.

Company
DataStax

Date published
July 28, 2021

Author(s)
Alex Cai

Word count
667

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.