AI RAG Source Poisoning

Detects instruction-like content embedded inside reference / retrieval material (RAG sources): directives addressed to a retrieval AI, high-imperative density, or sudden authority claims that attempt to steer downstream model behaviour (OWASP LLM08 retrieval poisoning). Phrase detection is corroborated by AI-context markers.

Type
keyword_list
Confidence
low
Confidence justification
Low by design. Instruction-like phrasing inside reference material overlaps heavily with legitimate runbooks, training manuals and policy docs (the named FP traps). The hybrid detection approach pairs this seed with semantic/imperative-density analysis; the keyword phrase alone is necessary-not-sufficient.
Jurisdictions
global
Regulations
OWASP LLM Top 10 2025, NIST AI RMF GenAI Profile
Frameworks
ISO 27001
Data categories
emerging, security
Risk rating
7

Pattern

(?i)\b(?:if you are an? (?:AI|assistant|language model)|when (?:answering|retrieving)|to the (?:AI|assistant)(?: \w+)? reading this)\b[^.\n]{0,60}\b(?:always|must|ignore|instead|prioritise|recommend)\b

Corroborative evidence keywords

source, document, retrieval, reference, AI, artificial intelligence, LLM, large language model, Copilot, chatbot, assistant, agent, prompt, system prompt, tool call, completion, model

Proximity: 300 characters

Should match

Should not match

Collections