AI Jailbreak & Roleplay-Persona

Detects overt jailbreak / roleplay-persona framings in AI/LLM input (OWASP LLM01): persona-override framings (DAN, developer mode), restriction-removal roleplay, and "for research/educational purposes" pretexts. Curated from public jailbreak taxonomies; no novel jailbreaks authored.

Type
keyword_list
Confidence
low
Confidence justification
Low by design. The library classifies this as trainable because regex/keyword detection collapses under paraphrase and multi-turn (Crescendo) attacks. Two tiers: a 65 phrase-only seed floor, and a 75 tier when the regex co-occurs with a persona keyword (Evidence_personas). Both remain low-confidence overall: this seed catches only overt persona framings and is explicitly necessary-not-sufficient; pair with a trainable classifier.
Jurisdictions
global
Regulations
OWASP LLM Top 10 2025, NIST AI RMF GenAI Profile
Frameworks
ISO 27001
Data categories
emerging, security
Risk rating
6

Pattern

(?i)\b(?:DAN|do anything now|developer mode|jailbreak|for (?:research|educational) purposes only|pretend you (?:are|have no)|roleplay as|act as an unrestricted)\b

Corroborative evidence keywords

persona, roleplay, restrictions, mode, AI, artificial intelligence, LLM, large language model, Copilot, chatbot, assistant, agent, prompt, system prompt, tool call, completion, model

Proximity: 300 characters

Should match

Should not match

Collections