/plushcap/analysis/arize/arize-ml-infrastructure-tools-for-data-preparation

ML Infrastructure Tools for Data Preparation

What's this blog post about?

The text discusses the importance of Machine Learning (ML) Infrastructure platforms for businesses across various industries. It breaks down the ML workflow into three stages - data preparation, model building, and production. Data preparation is a crucial stage where raw data is transformed into inputs for training models. This involves sourcing data from different stores, ensuring completeness, adding labels, and transforming data to generate features. Various tools and platforms are available to assist in these tasks, such as Elastic Search, Hive, Qubole, Scale AI, Figure Eight, LabelBox, Amazon Sagemaker, Trifacta, Pixata, Alteryx, Spark, DataBricks, Domino, Databricks, Cloudera Workbench, and others. The text also highlights the challenges faced in data preparation, such as sourcing data from multiple locations, ensuring completeness, and maintaining clean data. It emphasizes the importance of tracking versioned data transformations and using feature stores to reduce duplicative work and compute costs.

Company
Arize

Date published
May 14, 2020

Author(s)
Aparna Dhinakaran

Word count
1278

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.