/plushcap/analysis/fivetran/in-warehouse-machine-learning-and-the-modern-data-stack

In-Warehouse Machine Learning and the Modern Data Stack

What's this blog post about?

The article discusses the benefits of using in-warehouse machine learning services to create a modern data science stack. It highlights that these services remove silos and duplicated work for data analytics and data science teams, making models closer to the data they are training with and using for predictions. This shift from model-centric AI development to data-centric AI development is facilitated by having models stored in a data warehouse, which allows their predictions to be obtained via SQL queries. The article also mentions that these services can help avoid training-serving skew and make it straightforward to compose different steps of the machine learning process into a data pipeline. It provides an overview of BigQuery ML and Redshift ML, two in-warehouse machine learning services offered by Google Cloud Platform and AWS respectively. Additionally, it mentions that Snowflake can integrate with various machine learning tools like Sagemaker and Databricks to create a modern data science stack. The article concludes by emphasizing the importance of having a centralized location for all data-related tasks in an organization interested in performing both data analytics and data science.

Company
Fivetran

Date published
July 15, 2021

Author(s)
Nick Acosta

Word count
888

Language
English

Hacker News points
None found.