Benchmarking Query Analysis in High Cardinality Situations
Large Language Models (LLMs) often struggle with handling high-cardinality categorical values due to their inability to know the possible valid values for a field. This problem becomes harder as the number of possible values increases, causing issues with speed, cost, and context. To address this issue, various approaches have been experimented with, including Context Stuffing, Pre-LLM Filtering, and Post-LLM Selection. The most effective method found was using post-LLM selection via embedding similarity, which achieved 83% accuracy while being faster and cheaper than other methods. However, further benchmarking on higher cardinality data is necessary to fully address this problem in enterprise systems.
Company
LangChain
Date published
March 15, 2024
Author(s)
-
Word count
1441
Hacker News points
None found.
Language
English