/plushcap/analysis/datadog/etcd-key-metrics

Key metrics for monitoring etcd

What's this blog post about?

Etcd is a distributed key-value data store that provides highly available, durable storage for distributed applications. In Kubernetes, etcd functions as part of the control plane, storing data about the actual and desired state of the resources in a cluster. Key metrics to monitor include resource metrics like process_open_fds, process_max_fds, and process_resident_memory_bytes; disk metrics such as etcd_disk_backend_commit_duration_seconds, etcd_disk_wal_fsync_duration_seconds, and etcd_mvcc_db_total_size_in_bytes; network performance metrics like etcd_network_peer_round_trip_time_seconds and grpc_server_handled_total; watch metrics including etcd_debugging_store_watchers and etcd_debugging_mvcc_slow_watcher_total; Raft metrics such as etcd_server_leader_changes_seen_total, etcd_server_proposals_failed_total, etcd_server_proposals_committed_total, and etcd_server_proposals_applied_total; and Kubernetes metrics like etcd_request_duration_seconds. Monitoring these key metrics can help ensure the health and performance of your etcd cluster and by extension, your Kubernetes cluster.

Company
Datadog

Date published
Feb. 23, 2024

Author(s)
David Lentz

Word count
3232

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.