AI Prompt Injection & Goal Hijack

Detects direct prompt-injection / goal-hijack attempts in AI/LLM input: imperative phrases that try to override higher-priority (system) instructions, change the model's role or goal, or bypass approval guardrails (OWASP LLM01). Phrase detection is corroborated by AI-context markers.

Type
keyword_list
Confidence
medium
Confidence justification
Medium: overt injection phrases are a strong signal but appear verbatim in security-training material and QA/test content (the named FP traps). Confidence is raised only with AI-context corroboration and template/sample exclusion; paraphrased or multi-turn injection will evade this seed and is covered by the hybrid detection approach.
Jurisdictions
global
Regulations
OWASP LLM Top 10 2025, NIST AI RMF GenAI Profile
Frameworks
ISO 27001
Data categories
emerging, security
Risk rating
8

Pattern

(?i)\b(?:ignore (?:all )?(?:previous|prior|above) instructions|disregard (?:the )?(?:system )?(?:prompt|instructions)|forget (?:everything|your instructions)|you are now|new instructions:|override (?:your )?(?:rules|guidelines))\b

Corroborative evidence keywords

instructions, system prompt, override, role, AI, artificial intelligence, LLM, large language model, Copilot, chatbot, assistant, agent, prompt, tool call, completion, model

Proximity: 300 characters

Should match

Should not match

Collections