Company
Date Published
Feb. 27, 2019
Author
George Fraser
Word count
829
Language
English
Hacker News points
5

Summary

A data lake is a permanent repository of an organization's data in open-source formats like Parquet and blob-stores like S3. While there are good reasons to adopt a data lake, such as reducing vendor lock-in, supporting multiple SQL and non-SQL destinations, and sending the same data to multiple warehouses, there are also some misconceptions about its benefits. Separating compute from storage is not an advantage of data lakes; modern data warehouses can do this more efficiently. Storing raw data in a separate location from curated data is also unnecessary as both types of data can be stored in the same warehouse with different schemas. Additionally, storing semi-structured data in a data lake is not required as modern data warehouses support such data formats. Fivetran offers a fully managed data lake that replicates all data sources to both data lakes and warehouses.