Real-Time Aggregation Features for Machine Learning (Part 2)
Tecton's solution for real-time aggregation features for machine learning uses a tiled time window approach, where the aggregation is broken down into compacted tiles of smaller time windows that store aggregations over the tile interval and a set of projected raw events at the head and tail of the aggregation time window. The configuration requires selecting and projecting raw events using SQL, defining the aggregation information using a simple DSL, and streaming ingestion to the online store. Batch ingestion is used for backfilling and forward-filling the offline store. Compacted data is produced by periodic batch Spark jobs that read from the streaming source's offline mirror, reducing the worst-case number of rows that have to be fetched from the store. The solution provides benefits such as ultra-fresh features, compute and memory-cost efficient processing, fast feature retrieval, and scalable storage requirements. Airbnb and Uber use Tecton's implementation for several years. The approach can be generalized beyond just time-window aggregations and extend to completely user-defined transformation steps.
Company
Tecton
Date published
June 2, 2021
Author(s)
Kevin Stumpf
Word count
2025
Language
English
Hacker News points
None found.