Building real-time Analytics APIs at scale

Company

Algolia

Date Published

March 1, 2023

Author

Sylvain Friquet

Word count

1445

Language

English

Hacker News points

None

URL

www.algolia.com/blog/engineering/building-real-time-analytics-apis

Summary

We recently redesigned our analytics API to provide near real-time analytics for billions of search queries per day. Our previous system, which used batches of compressed log files and an Elasticsearch cluster, had limitations such as managing a large number of records across multiple nodes. We evaluated various data warehousing options like RedShift, BigQuery, and ClickHouse but found them not suitable for our real-time analytics workflow due to performance and pricing constraints. Instead, we chose Citus Data and its PostgreSQL extension, which allows us to scale our data store efficiently and leverage extensions like HLL and TopN for fast approximative distinct count and sorting. Our new system achieves sub-second analytical queries by distributing data across shards and using a roll-up approach, where we pre-compute metrics for specific time ranges and aggregate them in roll-up tables. This allows us to delete raw data and reduce storage requirements, resulting in improved performance and scalability.

Building real-time Analytics APIs at scale | Algolia

Summary