/plushcap/analysis/weaviate/weaviate-details-behind-the-sphere-dataset-in-weaviate

The Details Behind the Sphere Dataset in Weaviate

What's this blog post about?

This article discusses the process of importing a large dataset into Weaviate using Apache Spark. The author provides detailed information on the hardware and software setup used for this task, including the use of Google Kubernetes Engine (GKE) nodes and the text2vec-huggingface vectorizer module in Weaviate. The article also covers performance metrics during the import process, such as batch duration and LSM store size. Additionally, it mentions future developments to improve memory usage at scale, including Vamana and HNSW+PQ technologies.

Company
Weaviate

Date published
Dec. 27, 2022

Author(s)
Zain Hasan

Word count
1455

Language
English

Hacker News points
None found.