/plushcap/analysis/gretel-ai/awesome-synthetic-multilingual-prompts-dataset

An Awesome Synthetic Multilingual Prompts Dataset

What's this blog post about?

Gretel has released a comprehensive "Synthetic Multilingual LLM Prompts" dataset, featuring 1,250 synthetic prompts in seven languages. The dataset is designed for use with conversational LLMs like ChatGPT and is available on GitHub and Hugging Face. Translation quality was assessed using the LLM-as-a-Judge method, ensuring accuracy, fluency, and consistency across languages. This dataset is released under the Apache 2.0 license and can be used with proper attribution.

Company
Gretel.ai

Date published
July 3, 2024

Author(s)
Maarten Van Segbroeck

Word count
652

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.