Building Datasets to Enable Safer AI Responses
The Gretel's Synthetic Safety Dataset is a resource designed to align large language models (LLMs) with safe and ethical responses. The dataset features 8,361 triplets of "prompt", "response" and "safe response" spanning significant risk categories, including discrimination, harassment, propaganda, religious intolerance, gender bias, and more. It was created using Gretel Navigator's Data Designer toolkit and is available on HuggingFace. The dataset aims to provide a transparent and modular resource for the AI community to utilize in aligning models for secure and public-interest-focused interactions. It also highlights the importance of prompt generation benefits from human expertise in jailbreaking (attempts to bypass model restrictions) and red teaming (simulated attacks to test system security). The dataset can be used for pre-training and fine-tuning guardrails, stress-testing model robustness, facilitating rapid iteration and refinement, and benchmarking ethical and safety maturity.
Company
Gretel.ai
Date published
Dec. 13, 2024
Author(s)
Lipika Ramaswamy, Maarten Van Segbroeck, Dhruv Nathawani
Word count
1792
Language
English
Hacker News points
None found.