Scaling out PostgreSQL for CloudFlare Analytics using CitusDB
Albert Strasheim from Cloudflare shares their experience in building out a new Data Platform to handle rapidly increasing log volume due to traffic growth exceeding 400% annually. They started with a log processing pipeline built using Perl scripts, C++ programs and PostgreSQL for data storage. As the company grew, they needed a more scalable solution and discovered CitusDB, which scales out PostgreSQL for real-time workloads. The new pipeline includes an HTTP access log event that proceeds through various stages before being inserted into the CitusDB database with further rollups to 1-hour and 1-day granularity. They chose Go as their programming language due to its simplicity, performance, and extensive ecosystem of third-party libraries. Kafka was selected for its ability to provide a queue with persistence, real-time data processing, and compatibility with other languages. The CitusDB architecture enables horizontal scaling, high availability, and compatibility with PostgreSQL extensions like HyperLogLog and Hstore. Overall, the new Data Platform has improved performance and scalability while allowing for efficient storage of large amounts of historical analytics data.
Company
Cloudflare
Date published
April 9, 2015
Author(s)
Albert Strasheim
Word count
2110
Hacker News points
None found.
Language
English