/plushcap/analysis/fivetran/building-efficient-data-pipelines-with-incremental-updates

Building Efficient Data Pipelines With Incremental Updates

What's this blog post about?

The article discusses the importance of incrementally updating data pipelines instead of performing full syncs. Full syncs can take a long time and consume excessive network bandwidth, making them inefficient for routine updates. Incremental updates involve identifying changes made to a previous state of the source using methods like changelogs or last-modified timestamps. Changelogs contain a full history of updates, new records, and deleted records as a list of updates, while last-modified timestamps indicate when a record was last updated. However, challenges with incremental updates include scale, granularity of timestamps, late arrivals, and missing changelogs or timestamps. To overcome these issues, more performant code and efficient algorithms can be used to minimize sync times, using greater than or equal logic for timestamps, and identifying proxies for timestamps when necessary.

Company
Fivetran

Date published
Feb. 25, 2021

Author(s)
Meel Velliste

Word count
1197

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.