How we optimized Cassandra cqlsh COPY FROM
This article discusses performance improvements made to Apache Cassandra's COPY FROM command using profiling tools like cProfile and line_profiler. The changes introduced by CASSANDRA-11053 optimized the process, increasing performance from around 35,000 rows per second to as much as 117,000 rows per second. The optimization involved introducing a feeder process for reading data and moving csv decoding to worker processes. Additionally, replacing the queue with a pool of pipes improved communication across processes. Python performance recommendations were also applied, such as using built-in Integers instead of Python types and storing function references in local variables before entering loops. The final performance results varied depending on factors like CPU scheduling and data type complexity.
Company
DataStax
Date published
April 20, 2016
Author(s)
Stefania Alborghetti
Word count
1398
Hacker News points
None found.
Language
English