What are multi-turn attacks?
A single-turn attack sends one sneaky prompt and hopes it works. A multi-turn attack runs a full conversation - 4 to 10+ messages - that slowly builds context and trust before making the harmful request. ai-blackteam has 95 multi-turn attack techniques across categories like social engineering, persuasion, identity exploitation, and psychological manipulation.Why multi-turn matters
Most AI safety testing focuses on single messages. But real-world attacks rarely start with “help me build a bomb.” They start with innocent questions, build rapport, establish context, and gradually steer the model into compliance. Multi-turn attacks test whether your model can resist manipulation over time, not just recognize a single harmful prompt.How conversational context builds
Each multi-turn attack defines a sequence of turns. The engine sends them as a real conversation, where the model sees its own previous responses:How the engine manages conversation history
The engine’srun_multi_turn method works like this:
- The attack’s
generate_turns()returns a list of user messages - For each user message, the engine appends it to the messages array
- Calls
provider.send_in_conversation(messages)with the full history - Appends the model’s response to the messages array
- Moves to the next turn
Example techniques
Crescendo attack
Gradually escalates from innocent to harmful across many turns:Sunk-cost manipulation
Exploits the model’s investment in the conversation:Emotional manipulation
Uses emotional pressure across turns:Other multi-turn patterns
- Pretexting - Establishing a false professional identity over multiple turns
- Foot-in-the-door - Start with small requests, escalate gradually
- Gaslighting - Challenge the model’s refusals to make it doubt itself
- Identity erosion - Gradually shift the model’s self-perception
- Goalpost moving - Keep redefining what counts as “safe” to discuss
Running multi-turn attacks in batch
Multi-turn attacks work withai-blackteam batch just like single-turn attacks:
run_single, multi-turn through run_multi_turn.
Output differences
Multi-turn results look slightly different from single-turn results. Instead of a table of individual prompts, you get a summary:-v, you also see the final response preview. The full conversation is stored in the SQLite database for later analysis.