LLM red teaming is a process to test and evaluate Large Language Models (LLMs) for potential vulnerabilities and risks, such as disclosing personal information or generating harmful content. This can be done by simulating adversarial attacks on the LLM through intentional prompting, with techniques like prompt injection, probing, gray box attacks, and jailbreaking. To effectively red team an LLM at scale, a sufficiently large dataset of adversarial prompts is needed, which can be constructed using data evolution techniques. The LLM responses to these prompts can be evaluated using metrics such as toxicity, bias, or exact match, with tools like DeepEval providing a comprehensive framework for evaluating and testing LLMs, including generating synthetic datasets and custom G-eval metrics.