Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper

Company

Anyscale

Date Published

Aug. 23, 2023

Author

Waleed Kadous

Word count

2933

Language

English

Hacker News points

143

URL

www.anyscale.com/blog/llama-2-is-about-as-factually-accurate-as-gpt-4-for-summaries-and-is-30x-cheaper

Summary

Anyscale Endpoints has made experimentation with LLMs more accessible, allowing researchers to compare the factual accuracy of different models, including open-source LLMs like Llama 2. The comparison showed that Llama-2-70b is almost as strong as gpt-4 in terms of factuality and considerably better than gpt-3.5-turbo. However, Llama 2-7b and Llama 2-13b had severe ordering bias issues, while gpt-3.5-turbo showed a significant ordering bias. The cost comparison revealed that Llama 2 is 30 times cheaper for summarization than gpt-4, despite having similar performance levels. This experiment highlights the importance of considering the ordering bias when using LLMs for summaries and the potential benefits of using open-source LLMs like Llama 2.