/plushcap/analysis/fivetran/databases-demystified-distributed-databases-part-1

Databases Demystified Chapter 6 – Distributed Databases Part 1

What's this blog post about?

Distributed and single-node databases differ in their architecture, functionality, and use cases. Distributed databases consist of multiple computers storing data, while single-node databases run on a single computer. Examples of distributed databases include Google Spanner, Azure Cosmos, Redshift, Snowflake, and BigQuery. Single-node databases include PostgreSQL, MySQL, and SQLite. Distributed databases were developed to address the need for storing large volumes of data, speeding up queries by utilizing multiple computers' computational power simultaneously, and ensuring resiliency in case of hardware or network failures. While bigger and better single-node computers can work up to a certain point, they have limitations in terms of cost, size, and fault tolerance. Distributed databases are made up of clusters consisting of nodes (individual computers). There are two main paradigms for distributed databases: big compute and high availability. Big compute involves splitting or sharding data across different nodes to process queries faster, while high-availability databases duplicate data on each node to ensure fault tolerance. In summary, distributed databases allow for more efficient storage and processing of large amounts of data, as well as increased resilience in the face of hardware or network failures.

Company
Fivetran

Date published
Sept. 3, 2020

Author(s)
Michael Kaminsky

Word count
1385

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.