When to Adopt a Data Lake â and When Not to
A data lake is a permanent repository of an organization's data in open-source formats like Parquet and blob-stores like S3. While there are good reasons to adopt a data lake, such as reducing vendor lock-in, supporting multiple SQL and non-SQL destinations, and sending the same data to multiple warehouses, there are also some misconceptions about its benefits. Separating compute from storage is not an advantage of data lakes; modern data warehouses can do this more efficiently. Storing raw data in a separate location from curated data is also unnecessary as both types of data can be stored in the same warehouse with different schemas. Additionally, storing semi-structured data in a data lake is not required as modern data warehouses support such data formats. Fivetran offers a fully managed data lake that replicates all data sources to both data lakes and warehouses.
Company
Fivetran
Date published
Feb. 27, 2019
Author(s)
George Fraser
Word count
829
Hacker News points
5
Language
English