/plushcap/analysis/doublecloud/doublecloud-posts-2024-09-real-time-analytics-kafka-clickhouse-integration

Clickstream analytics case study. Part I: Kafka -> Data Transfer -> ClickHouse

What's this blog post about?

This case study explores a real-time analytics use case using DoubleCloud managed services, including Apache Kafka and ClickHouse. The goal is to set up a real-time analytics platform for clickstream data ingestion and decision making. The overall architecture of the data platform includes an external data provider, Kafka server, Data Transfer service, and ClickHouse database. The problem addressed in this case study involves performing aggregations on clickstream data to visualize basic metrics related to user activity on a website. A sample event from the data source is provided, along with details about the fields required for computing aggregated statistics. To set up the infrastructure, Kafka and ClickHouse clusters are provisioned using DoubleCloud managed services. Data Transfer service is used to connect the two clusters and transfer data between them. The Python code for sending data is also discussed. The series will continue with parts focusing on aggregations and visualization, while addressing performance when relevant.

Company
DoubleCloud

Date published
Sept. 6, 2024

Author(s)
Igor Mosyagin

Word count
1928

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.