Faster Bulk-Data Loading in CockroachDB
Cockroach Labs recently introduced a new algorithmic approach to organizing files in their storage system, Pebble, which led to an 80+% reduction in ingestion time for the standard TPC-C benchmark dataset. The team initially replaced the implementation of their IMPORT bulk-loading feature with a simpler and faster data ingestion pipeline but later faced issues where some IMPORTs were much slower or even stuck. They discovered that directly sending out-of-order data to the KV storage layer was causing these problems, as LSMs like RocksDB store data in order. The solution hinged upon their recent switch from RocksDB as their key-value store to Pebble, where they were able to add a new algorithmic approach to organizing files that led to the massive improvement.
Company
Cockroach Labs
Date published
Oct. 13, 2020
Author(s)
Bilal Akhtar
Word count
5283
Language
English
Hacker News points
14