Benchmarking Query Analysis in High Cardinality Situations

Company

LangChain

Date Published

March 15, 2024

Author

Word count

1441

Language

English

Hacker News points

None

URL

blog.langchain.dev/high-cardinality

Summary

Large Language Models (LLMs) often struggle with handling high-cardinality categorical values due to their inability to know the possible valid values for a field. This problem becomes harder as the number of possible values increases, causing issues with speed, cost, and context. To address this issue, various approaches have been experimented with, including Context Stuffing, Pre-LLM Filtering, and Post-LLM Selection. The most effective method found was using post-LLM selection via embedding similarity, which achieved 83% accuracy while being faster and cheaper than other methods. However, further benchmarking on higher cardinality data is necessary to fully address this problem in enterprise systems.