/plushcap/analysis/fivetran/slowly-changing-dimensions-in-data-science

Slowly Changing Dimensions in Data Science

What's this blog post about?

In this blog post, Michael Kaminsky discusses a common pitfall in data science when using cloud data warehouses for machine learning tasks. The issue arises with untracked slowly changing dimensions, which can lead to poor real-world predictive accuracy of models. An example is provided where the goal is to predict customer churn in a subscription business. Two main pitfalls are identified: incorrect setup and the problem of slowly changing dimensions. To address these issues, Kaminsky suggests using a log format for data storage that tracks all changes made to different values in the database. Fivetran's "history mode" feature is mentioned as a tool to automatically record and capture changes to slowly changing dimensions, allowing for better model generalization from training to production.

Company
Fivetran

Date published
Feb. 26, 2021

Author(s)
Michael Kaminsky

Word count
1568

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.