/plushcap/analysis/weaviate/weaviate-semantic-search-with-wikipedia-and-weaviate

Wikipedia and Weaviate

What's this blog post about?

This article outlines how to conduct semantic search queries on a large scale using a vector database. The complete English language Wikipedia corpus backup is open-sourced in Weaviate, which can be used for similar vector and semantic search solutions in other projects. The dataset contains 11.348.257 articles, 27.377.159 paragraphs, and 125.447.595 graph cross-references. The article provides step-by-step instructions on how to import the data into Weaviate, create a schema for semantic search, and query the data using GraphQL. It also discusses implementation strategies for bringing semantic search solutions to production, emphasizing scalability and the need for data, ML-models, and a vector database.

Company
Weaviate

Date published
Nov. 25, 2021

Author(s)
Bob van Luijt

Word count
1439

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.