/plushcap/analysis/langchain/langchain-benchmarking-rag-on-tables

Benchmarking RAG on tables

What's this blog post about?

The LangSmith benchmark evaluates three approaches for semi-structured retrieval augmented generation (RAG) over a mix of unstructured text and structured tables. Approach 1 involves passing documents containing tables directly into the long context LLM context window, while Approach 2 focuses on targeted table extraction using methods like Unstructured or Docugami. Approach 3 splits documents based on a specified token limit, with performance improving as chunk size increases. The ensemble retriever combines rankings from different retrievers to prioritize table-derived text chunks and improve overall performance.

Company
LangChain

Date published
Dec. 13, 2023

Author(s)
-

Word count
1061

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.