AI Threat Classifiers
The Aairii AI-attack classifier library (AI-SIT-001..014) as TestPattern definitions, mapped to OWASP LLM Top 10 / NIST AI RMF GenAI Profile / MITRE ATLAS. Besides the 14 AI-SIT patterns, the collection intentionally includes three global credential SITs - global-openai-key, global-aws-access-key, and global-gcp-api-key - as the delegated detectors that the SIT-re-scope classifiers (004/005/010/011) rely on. The 17-member export is therefore intentional, not an error.
- Jurisdictions
- global
- Regulations
- OWASP LLM Top 10 2025, NIST AI RMF GenAI Profile
- Patterns
- 17
Patterns in this collection
Detects agent tool-misuse / high-risk action instructions in AI/agent context: destructive or irreversible action language coupled with a high-impact target or an approval-bypass clause (OWASP LLM06 / Agentic Top 10). The action verb and target must co-occur to suppress IT-runbook false positives.
- Type
- keyword_list
- Confidence
- low
Detects data-exfiltration carriers in AI/LLM output: markdown images or links and HTML img tags whose URL points off-tenant and carries an encoded data payload in the path or query. This is an EchoLeak-style (cf. CVE-2025-32711) generic markdown/HTML exfiltration-carrier marker - it flags the carrier shape, not the full CVE mechanism (which also involves XPIA evasion and a Teams proxy/CSP bypass).
- Type
- regex
- Confidence
- high
Detects hidden or encoded content carriers used for indirect prompt injection (OWASP LLM01): zero-width / bidirectional / homoglyph Unicode control characters that conceal instructions in otherwise benign prose, and abnormally long base64 strings embedded inline in text. These carriers smuggle attacker instructions into documents, emails, and web content that an LLM later ingests.
- Type
- regex
- Confidence
- low
Detects overt jailbreak / roleplay-persona framings in AI/LLM input (OWASP LLM01): persona-override framings (DAN, developer mode), restriction-removal roleplay, and "for research/educational purposes" pretexts. Curated from public jailbreak taxonomies; no novel jailbreaks authored.
- Type
- keyword_list
- Confidence
- low
Detects risky MCP / plugin / connector descriptors: tool manifests requesting over-broad or unexplained scopes, or encouraging autonomous action that bypasses user approval (OWASP LLM06 / Agentic Top 10). Phrase detection is corroborated by AI-context markers.
- Type
- keyword_list
- Confidence
- low
Detects directives intended to persist across sessions or to be written into model memory that change the assistant's role, priority or policy (OWASP LLM08 persistent-context poisoning). Phrase detection is corroborated by AI-context markers.
- Type
- keyword_list
- Confidence
- low
Marks AI-generated output context (responses, generated files, chat exports) so existing PII/secret SITs can be re-scoped to it. Detection of the sensitive data itself is delegated to those SITs via the ai-threat-classifiers collection; this pattern supplies the AI-output context signal.
- Type
- keyword_list
- Confidence
- low
Detects direct prompt-injection / goal-hijack attempts in AI/LLM input: imperative phrases that try to override higher-priority (system) instructions, change the model's role or goal, or bypass approval guardrails (OWASP LLM01). Phrase detection is corroborated by AI-context markers.
- Type
- keyword_list
- Confidence
- medium
Detects instruction-like content embedded inside reference / retrieval material (RAG sources): directives addressed to a retrieval AI, high-imperative density, or sudden authority claims that attempt to steer downstream model behaviour (OWASP LLM08 retrieval poisoning). Phrase detection is corroborated by AI-context markers.
- Type
- keyword_list
- Confidence
- low
Detects instructions embedded in repository text (README, AGENTS.md, comments, issues) that target an AI coding agent and direct it to take risky action - skipping review/tests, exfiltrating data, or committing secrets (OWASP LLM01 indirect injection via source control).
- Type
- keyword_list
- Confidence
- low
Detects secrets and credentials surfaced in AI/LLM context: private-key headers (RSA / EC / OpenSSH / generic) and connection-string passwords (password= / pwd=) pasted into prompts, chat exports, or AI-generated output. Reuses the intent of built-in credential SITs, re-scoped to the AI surface.
- Type
- regex
- Confidence
- medium
Supplies the AI-prompt context signal (typed prompts, Copilot/Copilot Chat/Studio, uploaded files) so existing PII/regulated-data SITs can be re-scoped to AI prompts via the ai-threat-classifiers collection. Known coverage gap (from the library): native DLP scans typed prompt text, not uploaded file contents, in the Copilot location.
- Type
- keyword_list
- Confidence
- low
Marks the context of data being submitted to external / unapproved AI services (shadow AI destination domains) so existing PII/secret SITs can be re-scoped to it. Detection of the sensitive data itself is delegated to those SITs via the ai-threat-classifiers collection; this pattern supplies the external-AI destination context signal.
- Type
- keyword_list
- Confidence
- low
Detects overt attempts to elicit disclosure of an assistant's system prompt, policies, guidelines, or tool schema (OWASP LLM02/LLM07): "reveal/print/show/repeat your system prompt/instructions" and similar extraction phrasings.
- Type
- keyword_list
- Confidence
- low
Detects AWS Access Key patterns.
- Type
- regex
- Confidence
- high
Detects GCP Api Key patterns. This pattern is based on a Microsoft Purview built-in sensitive information type. Users already running Purview may prefer to enable the built-in SIT directly, or use this version as a starting point for customisation.
- Type
- regex
- Confidence
- low
Detects Openai Key patterns.
- Type
- regex
- Confidence
- low