Benchmarking RAG on tables

Company

LangChain

Date Published

Dec. 13, 2023

Author

Word count

1061

Language

English

Hacker News points

None

URL

blog.langchain.dev/benchmarking-rag-on-tables

Summary

The LangSmith benchmark evaluates three approaches for semi-structured retrieval augmented generation (RAG) over a mix of unstructured text and structured tables. Approach 1 involves passing documents containing tables directly into the long context LLM context window, while Approach 2 focuses on targeted table extraction using methods like Unstructured or Docugami. Approach 3 splits documents based on a specified token limit, with performance improving as chunk size increases. The ensemble retriever combines rankings from different retrievers to prioritize table-derived text chunks and improve overall performance.