/plushcap/analysis/tecton/tecton-real-time-aggregation-features-for-machine-learning-part-1

Real-Time Aggregation Features for Machine Learning (Part 1)

What's this blog post about?

Real-Time Aggregation Features for Machine Learning (Part 1)` discusses the technical challenges of serving rolling time window aggregations in real-time ML applications at high scale with low latency and high feature accuracy. The main challenges include memory constraints, backfilling historical data, maintaining high feature freshness, and generating training datasets. A naive implementation using a transactional database is not sufficient for high-scale applications, and precomputing aggregations in real-time as new raw data becomes available may also pose technical challenges. To address these issues, companies often employ stream processors like Apache Spark or Flink to run streaming time window aggregations, but this can be limited by memory constraints and the need for backfilling historical data. Maintaining high feature freshness is crucial, but this can be challenging due to the limitations of sliding time windows and the need for a separate compute path to train models using offline feature values.

Company
Tecton

Date published
June 2, 2021

Author(s)
Kevin Stumpf

Word count
1257

Language
English

Hacker News points
None found.