Benchmarking hybrid LLM classification systems

Company

Voiceflow

Date Published

April 21, 2024

Author

Denys Linkov

Word count

2787

Language

English

Hacker News points

None

URL

www.voiceflow.com/blog/benchmarking-hybrid-llm-classification-systems

Summary

The text discusses the development and evaluation of hybrid LLM classification systems, specifically in the context of conversational AI. Researchers experimented with combining an encoder NLU model with a large language model (LLM) to improve intent classification accuracy, reduce costs, and increase efficiency. The architecture uses two-tier few-shot learning approach for structure and context, with top 10 candidate intents retrieved using Voiceflow's NLU as the retriever. The hybrid system outperformed pure LLM methods on larger datasets while maintaining a simple user experience on smaller datasets. Cost analysis revealed significant savings in token usage, particularly for larger projects, with the hybrid architecture being significantly cheaper than LLM-based systems. Latency analysis showed that Gemini models had the lowest latency, followed by GPTs and Claudes. The study highlights the potential of hybrid LLM classification systems to create modular workflows and systems, making conversational AI more accessible to a broader audience.