Rebuff: Detecting Prompt Injection Attacks

Company

LangChain

Date Published

May 14, 2023

Author

Word count

983

Language

English

Hacker News points

None

URL

blog.langchain.dev/rebuff

Summary

Rebuff is an open source self-hardening prompt injection detection framework designed to protect AI applications from malicious inputs that can manipulate outputs, expose sensitive data, and allow unauthorized actions. It uses multiple layers of defense including heuristics, LLM-based detection, vectorDB, and canary tokens. The integration process involves setting up Rebuff, installing LangChain, detecting prompt injection with Rebuff, and setting up LangChain to detect prompt leakage by detecting a canary word in the output. Limitations include incomplete defense, being in its alpha stage, potential false positives/negatives, and treating LLM outputs as untrusted.