Company
Date Published
May 13, 2020
Author
Kevin King
Word count
2317
Language
English
Hacker News points
None

Summary

The Mux Data platform is a vital tool used by broadcasters to monitor video streaming experiences of their end users. It processes billions of video views every month and allows real-time querying of views as data is being collected. Previously, the storage of video views was done using sharded Postgres databases, which necessitated aggregation for acceptable performance on queries spanning more than a few hours. However, this came with limitations such as limited filtering, operational fragility and difficulty in making changes. Mux Data transitioned to ClickHouse, an open-source analytics database designed at Yandex, known for its speed. It uses columnar compression and sparse indices to achieve efficiency. The new architecture involves a single table called views with no aggregation or stored procedures. This resulted in significant performance improvements, including faster query times and the ability to handle complex queries that would often timeout client-side when sent to Postgres. Despite being column-oriented and using sparse indices, ClickHouse is still faster than sharded Postgres at retrieving a single row of data. The use of materialized indices was instrumental in reducing average single-row query latency from a few seconds to just 75ms. Additionally, the CollapsingMergeTree table engine allowed for "mutating" data without SQL UPDATE statements, which is not supported by ClickHouse. The transition to ClickHouse has been beneficial and Mux Data continues to explore ways to take advantage of its features, such as using Materialized Views, replacing the rollup process with AggregatingMergeTrees, and architecting an efficient real-time query layer.