Company
Date Published
Author
Sylvain Friquet
Word count
1445
Language
English
Hacker News points
None

Summary

We recently redesigned our analytics API to provide near real-time analytics for billions of search queries per day. Our previous system, which used batches of compressed log files and an Elasticsearch cluster, had limitations such as managing a large number of records across multiple nodes. We evaluated various data warehousing options like RedShift, BigQuery, and ClickHouse but found them not suitable for our real-time analytics workflow due to performance and pricing constraints. Instead, we chose Citus Data and its PostgreSQL extension, which allows us to scale our data store efficiently and leverage extensions like HLL and TopN for fast approximative distinct count and sorting. Our new system achieves sub-second analytical queries by distributing data across shards and using a roll-up approach, where we pre-compute metrics for specific time ranges and aggregate them in roll-up tables. This allows us to delete raw data and reduce storage requirements, resulting in improved performance and scalability.