The company has announced the release of three new major models, Llama 3.1 8B and 70B, and GPT-4o mini, on its OpenPipe platform. These models are extremely high quality but also saturate most standard evaluations, making it challenging to compare them. The saturation occurs because the new models are stronger than previous generations and have been trained on improved datasets relabeled with Mixture of Agents. This results in all three models performing similarly well on tasks such as Resume Summarization and Data Extraction but outperforming each other on Chatbot Responses, except for GPT-4o mini which does not saturate this task. The company is working on developing better benchmarks to compare the models and invites users to test them with their specific tasks.