/plushcap/analysis/cloudflare/scaling-out-postgresql-for-cloudflare-analytics-using-citusdb

Scaling out PostgreSQL for CloudFlare Analytics using CitusDB

What's this blog post about?

Albert Strasheim from Cloudflare shares their experience in building out a new Data Platform to handle rapidly increasing log volume due to traffic growth exceeding 400% annually. They started with a log processing pipeline built using Perl scripts, C++ programs and PostgreSQL for data storage. As the company grew, they needed a more scalable solution and discovered CitusDB, which scales out PostgreSQL for real-time workloads. The new pipeline includes an HTTP access log event that proceeds through various stages before being inserted into the CitusDB database with further rollups to 1-hour and 1-day granularity. They chose Go as their programming language due to its simplicity, performance, and extensive ecosystem of third-party libraries. Kafka was selected for its ability to provide a queue with persistence, real-time data processing, and compatibility with other languages. The CitusDB architecture enables horizontal scaling, high availability, and compatibility with PostgreSQL extensions like HyperLogLog and Hstore. Overall, the new Data Platform has improved performance and scalability while allowing for efficient storage of large amounts of historical analytics data.

Company
Cloudflare

Date published
April 9, 2015

Author(s)
Albert Strasheim

Word count
2110

Language
English

Hacker News points
120