/plushcap/analysis/tecton/tecton-real-time-aggregation-features-for-machine-learning-part-2

Real-Time Aggregation Features for Machine Learning (Part 2)

What's this blog post about?

Tecton's solution for real-time aggregation features for machine learning uses a tiled time window approach, where the aggregation is broken down into compacted tiles of smaller time windows that store aggregations over the tile interval and a set of projected raw events at the head and tail of the aggregation time window. The configuration requires selecting and projecting raw events using SQL, defining the aggregation information using a simple DSL, and streaming ingestion to the online store. Batch ingestion is used for backfilling and forward-filling the offline store. Compacted data is produced by periodic batch Spark jobs that read from the streaming source's offline mirror, reducing the worst-case number of rows that have to be fetched from the store. The solution provides benefits such as ultra-fresh features, compute and memory-cost efficient processing, fast feature retrieval, and scalable storage requirements. Airbnb and Uber use Tecton's implementation for several years. The approach can be generalized beyond just time-window aggregations and extend to completely user-defined transformation steps.

Company
Tecton

Date published
June 2, 2021

Author(s)
Kevin Stumpf

Word count
2025

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.