Company
Date Published
Author
Waleed Kadous
Word count
2933
Language
English
Hacker News points
143

Summary

Anyscale Endpoints has made experimentation with LLMs more accessible, allowing researchers to compare the factual accuracy of different models, including open-source LLMs like Llama 2. The comparison showed that Llama-2-70b is almost as strong as gpt-4 in terms of factuality and considerably better than gpt-3.5-turbo. However, Llama 2-7b and Llama 2-13b had severe ordering bias issues, while gpt-3.5-turbo showed a significant ordering bias. The cost comparison revealed that Llama 2 is 30 times cheaper for summarization than gpt-4, despite having similar performance levels. This experiment highlights the importance of considering the ordering bias when using LLMs for summaries and the potential benefits of using open-source LLMs like Llama 2.