Husky: Exactly-Once Ingestion and Multi-Tenancy at Scale

Post Details

Company

Datadog

Date Published

Feb. 22, 2023

Author

Daniel Intskirveli, Cecilia Watt

Word Count

4,354

Language

English

Hacker News Points

22

Source URL

www.datadoghq.com/blog/engineering/husky-deep-dive

Summary

Datadog's third-generation event store, Husky, is a distributed, time-series oriented, columnar store optimized for streaming ingestion and hybrid analytical and search queries. To ensure exactly once ingestion of every event into Husky’s storage engine, the company developed auto-scaling, multi-tenant data ingestion pipelines. They introduced locality by deterministically mapping events to groups of partitions called shards by their ID and timestamp. This allowed for efficient deduplication within a shard and reduced storage costs and improved performance. The Sharding Allocator ensures all Shard Router nodes have a consistent view of allocated Shard Placements, while the Autosharder periodically adjusts configured shard counts on a tenant-by-tenant basis to better fit observed traffic volume. Load balancing is achieved by shifting tenant Shard Placements around using a salting technique and a balancing algorithm that shifts placements around until all shards are roughly balanced. The Writers, responsible for exactly-once ingestion to Husky, persist event IDs in separate Husky tables from the raw event data to ensure consistency between the event data itself and the event IDs once they’ve been committed to the Metadata store.