Rebuff: Detecting Prompt Injection Attacks
Rebuff is an open source self-hardening prompt injection detection framework designed to protect AI applications from malicious inputs that can manipulate outputs, expose sensitive data, and allow unauthorized actions. It uses multiple layers of defense including heuristics, LLM-based detection, vectorDB, and canary tokens. The integration process involves setting up Rebuff, installing LangChain, detecting prompt injection with Rebuff, and setting up LangChain to detect prompt leakage by detecting a canary word in the output. Limitations include incomplete defense, being in its alpha stage, potential false positives/negatives, and treating LLM outputs as untrusted.
Company
LangChain
Date published
May 14, 2023
Author(s)
-
Word count
983
Hacker News points
None found.
Language
English