/plushcap/analysis/fly-io/machine-migrations

Making Machines Move

What's this blog post about?

Fly.io is a global public cloud with developer-friendly ergonomics that allows users to transmogrify Docker images into VMs running on their hardware worldwide. The platform's core design revolves around the tradeoff of durable storage for applications, which initially used attached NVMe drives but later transitioned to cloning volumes for efficient draining and migration. This process involved creating a new operation called "clone" that lazily copies data in the background while allowing new Fly Machines to be booted with the cloned volume attached. The system uses device mapper (dm-clone) and iSCSI protocols, along with the flyd orchestrator for managing operations on Fly Machines. Despite facing challenges such as encryption key management, trimming unused block space, and handling IPv6 private network addresses, the team has successfully implemented this solution to minimize downtime and data loss during worker draining. The ultimate goal is to achieve fully-automated luxury space migration for users' applications.

Company
Fly.io

Date published
July 30, 2024

Author(s)
Thomas Ptacek

Word count
3003

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.