/plushcap/analysis/doublecloud/posts-2022-11-how-s3-based-clickhouse-hybrid-storage-works-under-the-hood

How S3-based ClickHouse hybrid storage works under the hood

What's this blog post about?

The article discusses the working of S3-based ClickHouse hybrid storage under the hood. It explains how a hybrid approach is used to combine the speed of SSD disks and the affordability of S3 for storing large datasets, which are often in tens or hundreds of terabytes or even petabytes. The team at DoubleCloud developed this feature, which was successfully merged into ClickHouse version 22.3 on April 18, 2022. The hybrid storage approach is based on the principles of decoupling compute from storage and reducing storage costs by up to three to five times in applicable scenarios. The article delves into the details of how data is managed in S3, including creating files, adding data, renaming, deleting, and handling hard links. It also discusses limitations with S3 data operations, such as not supporting data replacement in the middle of a file. The article further explains caching mechanisms to speed up requests execution when different requests access the same data. It introduces an operations log that allows users to perform not all operations but only up to some revision, which can be used for backups. The concept of hybrid storage is also discussed, combining local and S3 disks to improve performance while maintaining cost-efficiency. Finally, the article touches upon replication mechanisms in S3, including zero-copy replication, which allows nodes to share the same S3 without full copying data. However, it mentions that this mechanism isn't yet considered production-ready and may have bugs. The article concludes by encouraging users to explore how hybrid storage can be applied to their projects or seek help from DoubleCloud for setting up and using this functionality.

Company
DoubleCloud

Date published
Nov. 25, 2022

Author(s)
-

Word count
3092

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.