Benchmarking RAG on tables
The LangSmith benchmark evaluates three approaches for semi-structured retrieval augmented generation (RAG) over a mix of unstructured text and structured tables. Approach 1 involves passing documents containing tables directly into the long context LLM context window, while Approach 2 focuses on targeted table extraction using methods like Unstructured or Docugami. Approach 3 splits documents based on a specified token limit, with performance improving as chunk size increases. The ensemble retriever combines rankings from different retrievers to prioritize table-derived text chunks and improve overall performance.
Company
LangChain
Date published
Dec. 13, 2023
Author(s)
-
Word count
1061
Language
English
Hacker News points
None found.