Late Chunking: Balancing Precision and Cost in Long Context Retrieval

Company

Weaviate

Date Published

Sept. 5, 2024

Author

Charles Pierse, Connor Shorten, Akanksha Sharma

Word count

2517

Language

English

Hacker News points

URL

weaviate.io/blog/late-chunking

Summary

JinaAI has introduced a new methodology called late chunking to aid in long-context retrieval for large documents. This approach aims to preserve contextual information across large documents by inverting the traditional order of embedding and chunking. Unlike naive chunking, which breaks up a document into chunks independently, or ColBERT, which requires significant storage capacity, late chunking maintains the contextual relationships between tokens across the entire document during the embedding process and only afterwards divides these contextually-rich embeddings into chunks. This method can help mitigate issues associated with very long documents, such as expensive LLM calls, increased latency, and a higher chance of hallucination. Late chunking offers a cost-effective path forward for users doing long context retrieval while preserving the contextual information that late interaction offers.