Company
Date Published
Dec. 18, 2023
Author
Sebastian Estevez
Word count
2338
Language
English
Hacker News points
3

Summary

DataStax Astra DB and Apache Cassandra have released Neighborhood Watch (nw), a configurable GPU-powered ground truth KNN dataset generator, to address limitations in existing KNN datasets. The tool is designed for generating ground truth datasets for high-dimension embeddings vectors that are more representative of what people are actually using today. It incorporates GPU acceleration and supports multiple embedding models (both open source and proprietary). Neighborhood Watch can be used to test the quality of Approximate Nearest Neighbors (ANN) by ensuring it returns a large, representative, ground truth KNN dataset.