Company
Date Published
July 31, 2024
Author
Together AI
Word count
5632
Language
English
Hacker News points
None

Summary

Llama 3.1, an open model rivaling top models, has sparked discussion on Twitter about differences in implementation decisions, optimizations, and quality testing processes among providers. A quick evaluation of Llama-3.1-405B showed significant variations in inference services, with some providers ranking high in GSM8K while others struggled with benchmark tests like AlpacaEval 2.0. The impact of these differences can be substantial, with a percentage point difference affecting the success or failure of an application task. To address this, Together AI has developed a five-step quality testing approach: reference matching, perplexity, analytic capability testing, generative capability testing, and qualitative testing. Their flagship implementation, Together Turbo, offers near-negligible differences in quality from the reference implementation with faster performance and lower cost, currently using FP8 quantization.