/plushcap/analysis/langchain/langchain-high-cardinality

Benchmarking Query Analysis in High Cardinality Situations

What's this blog post about?

Large Language Models (LLMs) often struggle with handling high-cardinality categorical values due to their inability to know the possible valid values for a field. This problem becomes harder as the number of possible values increases, causing issues with speed, cost, and context. To address this issue, various approaches have been experimented with, including Context Stuffing, Pre-LLM Filtering, and Post-LLM Selection. The most effective method found was using post-LLM selection via embedding similarity, which achieved 83% accuracy while being faster and cheaper than other methods. However, further benchmarking on higher cardinality data is necessary to fully address this problem in enterprise systems.

Company
LangChain

Date published
March 15, 2024

Author(s)
-

Word count
1441

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.