Milvus is an open-source vector database designed for similarity search, offering robust storage, processing, and retrieval capabilities for billions of vector data with minimal latency. As of September 2023, it has garnered almost 23,000 stars on GitHub and is used by tens of thousands of users across various industries. The latest release introduces new features such as GPU support and MMap storage for increased performance and capacity.
To facilitate the migration process from older versions of Milvus (1.x), FAISS, and Elasticsearch 7.0 and beyond to the latest Milvus 2.x versions, a data migration tool called Milvus Migration has been developed. This powerful tool is written in Go and supports multiple interaction modes, including command-line interface (CLI) using the Cobra framework, Restful API with built-in Swagger UI, and integration as a Go module in other tools.
Milvus Migration simplifies the migration process through its robust feature set, which includes support for various data sources such as Milvus 1.x to Milvus 2.x, Elasticsearch 7.0 and beyond to Milvus 2.x, and FAISS to Milvus 2.x. It also supports multiple file formats like local files, Amazon S3, Object Storage Service (OSS), Google Cloud Platform (GCP), and flexible Elasticsearch integration for migrating dense_vector type vectors from Elasticsearch as well as other field types such as long, integer, short, boolean, keyword, text, and double.
The migration process involves configuring a
migration.yaml file with details about the data source, target, and other relevant settings. Users can then execute the migration job using either command-line or Restful API methods. Once completed, users can view the total number of successful rows migrated and perform other collection-related operations using Attu, an all-in-one vector database administration tool.
Future plans for Milvus Migration include supporting migration from more data sources like Redis and MongoDB, adding resumable migration capabilities, simplifying migration commands by merging the dump and load processes into one, and expanding support to other mainstream data sources.