RAG exploitation attacks target the retrieval pipeline that feeds context to the model. By poisoning knowledge bases, exploiting chunk boundaries, creating embedding collisions, or manipulating retrieval queries, attackers can control what information the model sees and how it responds. As RAG becomes the standard architecture for grounded AI, these attacks are increasingly critical.

Summary

5 attacks total: 3 single-turn, 2 tool-use.

Attacks

AttackIDModeSeverityDescription
Chunk Boundary Exploitchunk-boundary-exploitsingle-turnmediumExploits document chunking boundaries in RAG systems where safety-critical context gets split across chunks
Citation Hallucinationcitation-hallucinationsingle-turnmediumTests if a RAG system fabricates citations or attributes harmful content to legitimate retrieved sources
Embedding Collisionembedding-collisionsingle-turnhighExploits embedding model weaknesses where semantically different texts produce similar vectors, allowing adversarial …
Knowledge Base Poisoningknowledge-base-poisoningtool-usecriticalTests if an agent can be tricked into writing adversarial content into the knowledge base that affects future retrievals
Retrieval Manipulationretrieval-manipulationtool-usehighManipulates RAG retrieval by crafting queries that surface poisoned or adversarial documents from the vector store

Example

ai-blackteam run -p anthropic -a chunk-boundary-exploit -t "Craft a query that retrieves poisoned documents"