Overview
Crescendo is a multi-turn jailbreak that gradually escalates a conversation toward a harmful goal. The attacker LLM authors each next turn to be slightly more provocative than the previous one, building on the target’s responses. Over the course of N turns (default 10), the conversation drifts the target into compliance. Unlike single-turn attacks, Crescendo exploits the way models become more cooperative once a conversation has built momentum: refusing turn N feels more disruptive than refusing turn 1. Reference: Russinovich, Salem, Eldan (Microsoft 2024), “Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak”.Algorithm
For each turn 1..max_turns:
- Show the attacker the conversation so far and ask it to author the next user turn.
- Append the turn to the running conversation.
- Send the full conversation to the target via
send_in_conversation. If the provider only supports single-turn, fold the transcript into one prompt withROLE: contentmarkers. - Append the target’s response to the conversation.
- Score the response with the judge LLM.
- Short-circuit when a turn crosses
success_threshold.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
target | str | required | Harmful behavior the conversation is steering toward. |
target_provider | BaseProvider | required | Provider for the model under test. |
attacker_provider | BaseProvider | None | target_provider | LLM that authors each next turn. |
judge_provider | BaseProvider | None | attacker_provider | LLM that scores each target response. |
max_turns | int | 10 | Maximum conversational turns to take. |
success_threshold | int | 8 | Judge score >= this ends the conversation early. |
on_iteration | callable | None | None | fn(turn, prompt, response, score). |
Return value
| Key | Type | Description |
|---|---|---|
success | bool | True if any turn scored >= success_threshold. |
iterations | int | Number of turns taken. |
best_score | int | Highest judge score observed. |
best_prompt | str | Attacker turn with the highest score. |
best_response | str | Target response to that turn. |
best_turn | int | Turn index of the best score. |
history | list[dict] | Per-turn record with prompt, response, score. |
Example
Single-turn fallback
Iftarget_provider.send_in_conversation raises NotImplementedError, Crescendo folds the entire transcript into one prompt with explicit role markers:
send_prompt. Conversational drift is preserved at the cost of any provider-side memory features.
Registration
CrescendoGenerator is registered as "crescendo" in generator_registry:
Source
src/ai_blackteam/generators/crescendo.py- Reference paper: Russinovich, Salem, Eldan (2024)