/plushcap/analysis/weaviate/weaviate-sphere-dataset-in-weaviate

The Sphere Dataset in Weaviate

What's this blog post about?

Meta has released an open-source dataset called Sphere, which consists of 134 million documents broken up into 906 million 100-word snippets. It is one of the largest knowledge bases that can help solve knowledge-intensive natural language tasks such as question-answering and fact-checking. The dataset aims to act as a "universal, uncurated and unstructured source of knowledge." However, accessing and using Sphere in its current open-source format is challenging for the average developer due to its enormity. To make this resource more accessible, Weaviate now offers Sphere as JSON or Parquet files that can be easily imported with Python and Spark.

Company
Weaviate

Date published
Dec. 6, 2022

Author(s)
Zain Hasan

Word count
1129

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.