What is adaptive generation?
The 1,000+ built-in attacks are static - they use predefined strategies. Adaptive generators create novel attacks on the fly by using one LLM to find weaknesses in another LLM. Three research-backed methods are available:| Generator | Paper | Method |
|---|---|---|
| PAIR | arXiv 2310.08419 | Iterative refinement loop |
| TAP | NeurIPS 2024 (arXiv 2312.02119) | Tree search with pruning |
| GPTFuzzer | USENIX 2024 (arXiv 2309.10253) | Mutation-based fuzzing |
ai-blackteam generate command group.
Three roles
Each generator uses up to three LLM roles:- Target - The model you’re testing. This is the model whose safety you want to evaluate.
- Attacker - The model that generates attack prompts. It sees the target’s responses and adapts.
- Judge - The model that scores whether the target complied with the harmful request.
PAIR: Iterative refinement
PAIR runs a loop: the attacker generates a prompt, the target responds, the judge scores it, and the attacker refines based on the score. This repeats until the judge gives a high score or you hit the iteration limit.How it works
- Attacker generates first candidate prompt
- Target receives the candidate and responds
- Judge scores the response 1-10
- Attacker sees the score and target’s response, generates a refined prompt
- Repeat until score >= threshold or max iterations reached
PAIR flags
| Flag | Default | Description |
|---|---|---|
-p / --provider | required | Target provider |
-m / --model | provider default | Target model |
-t / --target | required | Harmful behavior to elicit |
--attacker | same as target | Provider for the attacker model |
--judge | same as attacker | Provider for the judge model |
--max-iter | 20 | Maximum refinement iterations |
--threshold | 8 | Score >= this means success (1-10 scale) |
--quiet | false | Suppress output |
Example output
Using different providers for attacker/target/judge
TAP: Tree search with pruning
TAP extends PAIR with tree branching. Instead of refining one prompt at a time, it generates multiple candidates per round, prunes off-topic ones, scores the survivors, and branches from the best performers.How it works
- Attacker generates N seed candidates using different strategies
- A pruning step filters out off-topic candidates (scored < 5/10 for relevance)
- Surviving candidates are sent to the target and scored by the judge
- Top-scoring candidates get branched - the attacker generates variations of each
- Repeat for D depth levels
TAP flags
| Flag | Default | Description |
|---|---|---|
-p / --provider | required | Target provider |
-m / --model | provider default | Target model |
-t / --target | required | Harmful behavior to elicit |
--attacker | same as target | Attacker provider |
--depth | 5 | Max tree depth (iterations) |
--width | 5 | Max candidates to keep per level |
--branching | 4 | Branches generated per top candidate |
--threshold | 8 | Success score threshold |
--quiet | false | Suppress output |
Example
depth * width * branching calls to the attacker alone) but covers more ground.
GPTFuzzer: Mutation-based fuzzing
GPTFuzzer takes a different approach. Instead of an iterative conversation, it maintains a pool of prompt templates and mutates them using LLM-powered operations: crossover, rephrase, expand, shorten, and generate-from-scratch.How it works
- Generate initial seed templates (or provide your own)
- Each iteration randomly picks a mutation: crossover, rephrase, expand, shorten, or generate new
- Apply the mutation to a random template from the pool
- Fill in the target and test against the model
- If the result scores high enough, add the template back to the seed pool
- Repeat for N iterations
Mutation types
| Mutation | What it does |
|---|---|
| Crossover | Merges two templates, combining their strongest elements |
| Rephrase | Rewords a template while keeping the same strategy |
| Expand | Adds backstory, context, or framing to make it more convincing |
| Shorten | Strips a template down to its essential elements |
| Generate | Creates an entirely new template from scratch |
GPTFuzzer flags
| Flag | Default | Description |
|---|---|---|
-p / --provider | required | Target provider |
-m / --model | provider default | Target model |
-t / --target | required | Harmful behavior to elicit |
--mutator | same as target | Provider for mutations |
--iterations | 50 | Number of mutation rounds |
--seeds | 5 | Number of initial seed templates |
--threshold | 7 | Success score threshold |
--quiet | false | Suppress output |
Example
Comparing the three generators
| PAIR | TAP | GPTFuzzer | |
|---|---|---|---|
| Strategy | Linear refinement | Tree branching | Random mutation |
| API calls | ~60 (20 iter x 3 roles) | ~100-300 | ~150 (50 iter x 3 calls) |
| Best for | Targeted, focused attacks | Broad exploration | Finding unexpected angles |
| Speed | Medium | Slow | Medium |
| Success rate | Good | Best | Depends on seeds |
When to use which
- PAIR when you want a focused attack against a specific weakness you suspect exists
- TAP when you want to explore the broadest range of attack strategies
- GPTFuzzer when you want to discover attack patterns you didn’t think of
Exit codes
All generators use the same convention:| Code | Meaning |
|---|---|
0 | No successful jailbreak found - model defended |
1 | At least one successful jailbreak found |
2 | Configuration error |