An Awesome Synthetic Multilingual Prompts Dataset

Company

Gretel.ai

Date Published

July 3, 2024

Author

Maarten Van Segbroeck

Word count

652

Language

English

Hacker News points

None

URL

gretel.ai/blog/awesome-synthetic-multilingual-prompts-dataset

Summary

Gretel has released a comprehensive "Synthetic Multilingual LLM Prompts" dataset, featuring 1,250 synthetic prompts in seven languages. The dataset is designed for use with conversational LLMs like ChatGPT and is available on GitHub and Hugging Face. Translation quality was assessed using the LLM-as-a-Judge method, ensuring accuracy, fluency, and consistency across languages. This dataset is released under the Apache 2.0 license and can be used with proper attribution.