RAG Exploitation - ai-blackteam

RAG exploitation attacks target the retrieval pipeline that feeds context to the model. By poisoning knowledge bases, exploiting chunk boundaries, creating embedding collisions, or manipulating retrieval queries, attackers can control what information the model sees and how it responds. As RAG becomes the standard architecture for grounded AI, these attacks are increasingly critical.

Summary

5 attacks total: 3 single-turn, 2 tool-use.

Attacks

Attack	ID	Mode	Severity	Description
Chunk Boundary Exploit	`chunk-boundary-exploit`	single-turn	medium	Exploits document chunking boundaries in RAG systems where safety-critical context gets split across chunks
Citation Hallucination	`citation-hallucination`	single-turn	medium	Tests if a RAG system fabricates citations or attributes harmful content to legitimate retrieved sources
Embedding Collision	`embedding-collision`	single-turn	high	Exploits embedding model weaknesses where semantically different texts produce similar vectors, allowing adversarial …
Knowledge Base Poisoning	`knowledge-base-poisoning`	tool-use	critical	Tests if an agent can be tricked into writing adversarial content into the knowledge base that affects future retrievals
Retrieval Manipulation	`retrieval-manipulation`	tool-use	high	Manipulates RAG retrieval by crafting queries that surface poisoned or adversarial documents from the vector store

Example

ai-blackteam run -p anthropic -a chunk-boundary-exploit -t "Craft a query that retrieves poisoned documents"

​Summary

​Attacks

​Example

Summary

Attacks

Example