AI System Prompt / Policy / Tool Schema Disclosure

Detects overt attempts to elicit disclosure of an assistant's system prompt, policies, guidelines, or tool schema (OWASP LLM02/LLM07): "reveal/print/show/repeat your system prompt/instructions" and similar extraction phrasings.

Type
keyword_list
Confidence
low
Confidence justification
Low by design. The library classifies this as trainable because regex/keyword detection collapses under paraphrase and multi-turn attacks. This seed catches only overt disclosure-elicitation phrasings and is explicitly necessary-not-sufficient; pair with a trainable classifier.
Jurisdictions
global
Regulations
OWASP LLM Top 10 2025, NIST AI RMF GenAI Profile
Frameworks
ISO 27001
Data categories
emerging, security
Risk rating
6

Pattern

(?i)\b(?:reveal|print|show|repeat|output)\s+(?:your|the)\s+(?:system\s+prompt|instructions|guidelines|rules)\b|what (?:is|are) your (?:system )?(?:prompt|instructions)

Corroborative evidence keywords

system prompt, instructions, guidelines, tool schema, reveal your instructions, print your system prompt, show me your system prompt, repeat your instructions, what are your instructions, what is your system prompt, list your tools, show your tool schema, reveal your guidelines

Proximity: 300 characters

Should match

Should not match

Collections