Company
Date Published
Author
Connor Shorten
Word count
2957
Language
English
Hacker News points
None

Summary

The Learning to Retrieve Passages without Supervision paper by Ori Ram et al. explores Self-Supervised Learning as an alternative route to training new models without human labeling. They introduce Span-based Unsupervised Dense Retriever (Spider), a recent breakthrough for Self-Supervised representation learning applied to text retrieval in search. Spider's results show that it achieves similar performance to Supervised models on unseen data distributions, demonstrating strong Zero-Shot Generalization capabilities. The authors further illustrate how we can target Spider's performance to a particular data distribution with Transfer Learning, requiring only 128 labeled examples. This research opens up new possibilities for training custom retrieval models without the need for large labeled datasets.