Analyzing Cassandra Data using GPUs, Part 2
The text discusses the development of sstable-to-arrow, a C++17 program that uses Kaitai Struct library for parsing Cassandra's SSTable files into Arrow data format. It aims to enable GPU analytics on SSTable data by converting each column in the table into an Arrow Vector and shipping the data to clients where it can be converted into a cuDF for further analysis. The ultimate goal is to include a read_sstable function in RAPIDS ecosystem similar to cudf.DataFrame.from_csv. Performance improvements, broadening support for different CQL types, handling large datasets are areas of continuous development. sstable-to-arrow can be run using Docker and supports saving SSTable data as a Parquet file. It is available on GitHub and accessible via Docker Hub as an alpha release. A free online workshop with hands-on examples will be held in mid August for those interested in trying out the project.
Company
DataStax
Date published
July 28, 2021
Author(s)
Alex Cai
Word count
667
Hacker News points
None found.
Language
English