/plushcap/analysis/datastax/datastax-how-we-optimized-cassandra-cqlsh-copy

How we optimized Cassandra cqlsh COPY FROM

What's this blog post about?

This article discusses performance improvements made to Apache Cassandra's COPY FROM command using profiling tools like cProfile and line_profiler. The changes introduced by CASSANDRA-11053 optimized the process, increasing performance from around 35,000 rows per second to as much as 117,000 rows per second. The optimization involved introducing a feeder process for reading data and moving csv decoding to worker processes. Additionally, replacing the queue with a pool of pipes improved communication across processes. Python performance recommendations were also applied, such as using built-in Integers instead of Python types and storing function references in local variables before entering loops. The final performance results varied depending on factors like CPU scheduling and data type complexity.

Company
DataStax

Date published
April 20, 2016

Author(s)
Stefania Alborghetti

Word count
1398

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.