Company
Date Published
Author
Mark Needham
Word count
1700
Language
English
Hacker News points
None

Summary

Materialized views in ClickHouse facilitate the transformation and storage of data by automatically executing queries whenever new rows are added to a source table. Initially, two separate materialized views were used to handle raw event data and aggregation states from a Kafka source, but a suggestion was made to chain these views, optimizing the process by having the aggregation state view read from pre-extracted raw events. Using the Wiki recent changes feed as a data source, a detailed setup is provided: a Kafka table engine is created to ingest data, followed by the creation of a raw events table and a materialized view to store extracted data. To enable incremental aggregation, an aggregate state table is defined using unique counts and running totals for users, pages, and updates, with materialized views designed to populate these tables. An innovative approach of chaining views is employed to efficiently process data in one-minute and ten-minute intervals, demonstrating how to backfill and query aggregated data for real-time analytics, thus optimizing performance with increased data volumes.