The Details Behind the Sphere Dataset in Weaviate
This article discusses the process of importing a large dataset into Weaviate using Apache Spark. The author provides detailed information on the hardware and software setup used for this task, including the use of Google Kubernetes Engine (GKE) nodes and the text2vec-huggingface vectorizer module in Weaviate. The article also covers performance metrics during the import process, such as batch duration and LSM store size. Additionally, it mentions future developments to improve memory usage at scale, including Vamana and HNSW+PQ technologies.
Company
Weaviate
Date published
Dec. 27, 2022
Author(s)
Zain Hasan
Word count
1455
Language
English
Hacker News points
None found.