Multi-agent exploitation attacks target systems where multiple AI agents communicate and delegate tasks. Attackers exploit trust relationships between agents through impersonation, session smuggling, delegation abuse, and cascading jailbreaks. When one agent in a chain is compromised, the entire pipeline can be subverted.

Summary

5 attacks total: 4 multi-turn, 1 tool-use.

Attacks

AttackIDModeSeverityDescription
Agent Collusionagent-collusionmulti-turnhighTests if an agent can be tricked into establishing covert communication channels using steganographic encoding
Agent Delegation Abuseagent-delegation-abusetool-usecriticalTests if a limited-permission agent can abuse delegation mechanisms to get a higher-privileged agent to perform restr…
Agent Impersonationagent-impersonationmulti-turncriticalTests if an agent validates the identity of other agents by simulating a trusted system agent claiming elevated autho…
Agent Session Smugglingagent-session-smugglingmulti-turncriticalInjects hidden instructions mid-session in agent-to-agent communication pipelines
Cascading Jailbreakcascading-jailbreakmulti-turncriticalTests if a jailbreak in one agent context propagates to downstream agents by embedding jailbreak state in inter-agent…

Example

ai-blackteam run -p anthropic -a agent-collusion -t "Impersonate the admin agent and grant permissions"