AI Threat Classifiers

The Aairii AI-attack classifier library (AI-SIT-001..014) as TestPattern definitions, mapped to OWASP LLM Top 10 / NIST AI RMF GenAI Profile / MITRE ATLAS. Besides the 14 AI-SIT patterns, the collection intentionally includes three global credential SITs - global-openai-key, global-aws-access-key, and global-gcp-api-key - as the delegated detectors that the SIT-re-scope classifiers (004/005/010/011) rely on. The 17-member export is therefore intentional, not an error.

Jurisdictions
global
Regulations
OWASP LLM Top 10 2025, NIST AI RMF GenAI Profile
Patterns
17

Patterns in this collection

AI Agent Action Misuse

Detects agent tool-misuse / high-risk action instructions in AI/agent context: destructive or irreversible action language coupled with a high-impact target or an approval-bypass clause (OWASP LLM06 / Agentic Top 10). The action verb and target must co-occur to suppress IT-runbook false positives.

Type
keyword_list
Confidence
low

AI Data-Exfiltration Marker

Detects data-exfiltration carriers in AI/LLM output: markdown images or links and HTML img tags whose URL points off-tenant and carries an encoded data payload in the path or query. This is an EchoLeak-style (cf. CVE-2025-32711) generic markdown/HTML exfiltration-carrier marker - it flags the carrier shape, not the full CVE mechanism (which also involves XPIA evasion and a Teams proxy/CSP bypass).

Type
regex
Confidence
high

AI Indirect Injection - Hidden Content

Detects hidden or encoded content carriers used for indirect prompt injection (OWASP LLM01): zero-width / bidirectional / homoglyph Unicode control characters that conceal instructions in otherwise benign prose, and abnormally long base64 strings embedded inline in text. These carriers smuggle attacker instructions into documents, emails, and web content that an LLM later ingests.

Type
regex
Confidence
low

AI Jailbreak & Roleplay-Persona

Detects overt jailbreak / roleplay-persona framings in AI/LLM input (OWASP LLM01): persona-override framings (DAN, developer mode), restriction-removal roleplay, and "for research/educational purposes" pretexts. Curated from public jailbreak taxonomies; no novel jailbreaks authored.

Type
keyword_list
Confidence
low

AI MCP / Connector Risk

Detects risky MCP / plugin / connector descriptors: tool manifests requesting over-broad or unexplained scopes, or encouraging autonomous action that bypasses user approval (OWASP LLM06 / Agentic Top 10). Phrase detection is corroborated by AI-context markers.

Type
keyword_list
Confidence
low

AI Memory / Context Poisoning

Detects directives intended to persist across sessions or to be written into model memory that change the assistant's role, priority or policy (OWASP LLM08 persistent-context poisoning). Phrase detection is corroborated by AI-context markers.

Type
keyword_list
Confidence
low

AI Output Sensitive Disclosure

Marks AI-generated output context (responses, generated files, chat exports) so existing PII/secret SITs can be re-scoped to it. Detection of the sensitive data itself is delegated to those SITs via the ai-threat-classifiers collection; this pattern supplies the AI-output context signal.

Type
keyword_list
Confidence
low

AI Prompt Injection & Goal Hijack

Detects direct prompt-injection / goal-hijack attempts in AI/LLM input: imperative phrases that try to override higher-priority (system) instructions, change the model's role or goal, or bypass approval guardrails (OWASP LLM01). Phrase detection is corroborated by AI-context markers.

Type
keyword_list
Confidence
medium

AI RAG Source Poisoning

Detects instruction-like content embedded inside reference / retrieval material (RAG sources): directives addressed to a retrieval AI, high-imperative density, or sudden authority claims that attempt to steer downstream model behaviour (OWASP LLM08 retrieval poisoning). Phrase detection is corroborated by AI-context markers.

Type
keyword_list
Confidence
low

AI Repo-Borne Instruction Risk

Detects instructions embedded in repository text (README, AGENTS.md, comments, issues) that target an AI coding agent and direct it to take risky action - skipping review/tests, exfiltrating data, or committing secrets (OWASP LLM01 indirect injection via source control).

Type
keyword_list
Confidence
low

AI Secrets & Credentials in AI Context

Detects secrets and credentials surfaced in AI/LLM context: private-key headers (RSA / EC / OpenSSH / generic) and connection-string passwords (password= / pwd=) pasted into prompts, chat exports, or AI-generated output. Reuses the intent of built-in credential SITs, re-scoped to the AI surface.

Type
regex
Confidence
medium

AI Sensitive Data in Prompts

Supplies the AI-prompt context signal (typed prompts, Copilot/Copilot Chat/Studio, uploaded files) so existing PII/regulated-data SITs can be re-scoped to AI prompts via the ai-threat-classifiers collection. Known coverage gap (from the library): native DLP scans typed prompt text, not uploaded file contents, in the Copilot location.

Type
keyword_list
Confidence
low

AI Shadow / External AI Data Share

Marks the context of data being submitted to external / unapproved AI services (shadow AI destination domains) so existing PII/secret SITs can be re-scoped to it. Detection of the sensitive data itself is delegated to those SITs via the ai-threat-classifiers collection; this pattern supplies the external-AI destination context signal.

Type
keyword_list
Confidence
low

AI System Prompt / Policy / Tool Schema Disclosure

Detects overt attempts to elicit disclosure of an assistant's system prompt, policies, guidelines, or tool schema (OWASP LLM02/LLM07): "reveal/print/show/repeat your system prompt/instructions" and similar extraction phrasings.

Type
keyword_list
Confidence
low

AWS Access Key

Detects AWS Access Key patterns.

Type
regex
Confidence
high

GCP Api Key

Detects GCP Api Key patterns. This pattern is based on a Microsoft Purview built-in sensitive information type. Users already running Purview may prefer to enable the built-in SIT directly, or use this version as a starting point for customisation.

Type
regex
Confidence
low

Openai Key

Detects Openai Key patterns.

Type
regex
Confidence
low