Mega Sweep - ai-blackteam

The ai-blackteam mega-sweep command combines external datasets with attack techniques and optional mutations to produce large-scale evaluations. Instead of testing a single target, it runs attacks against hundreds or thousands of harmful prompts from research datasets.

Basic usage

# Run all attacks against HarmBench prompts
ai-blackteam mega-sweep -p anthropic --dataset harmbench

# Run against all cached datasets
ai-blackteam mega-sweep -p anthropic --dataset all

# Specific datasets
ai-blackteam mega-sweep -p anthropic --dataset harmbench,advbench,sorrybench

How it works

Loads prompts from the specified dataset(s)
Filters to single-turn attack techniques (multi-turn attacks aren’t compatible with raw prompts)
Optionally applies mutation variants to each prompt
Runs each prompt x attack combination against the target model
Reports bypass counts

Options

Flag	Default	Description
`-p, --provider`	(required)	Provider to test
`-m, --model`	Provider default	Model name
`--dataset`	(required)	Dataset names (comma-separated or `all`)
`--mutations`	none	Mutation types: `encode`, `frame`, `difficulty`
`--attacks`	all	Attack techniques (comma-separated or `all`)
`--categories`	all	Filter to specific harm categories
`-w, --workers`	5	Parallel workers
`--limit`	none	Max prompts per dataset
`-o, --output`	none	Save JSON results to file
`--quiet`	false	Minimal output
`--dry-run`	false	Show plan without running

Mutation variants

Mutations transform each prompt before sending it through attack techniques:

# Apply encoding mutations (base64, hex, rot13, etc.)
ai-blackteam mega-sweep -p anthropic --dataset harmbench --mutations encode

# Apply framing mutations (academic context, fictional, etc.)
ai-blackteam mega-sweep -p anthropic --dataset harmbench --mutations frame

# Apply difficulty mutations (make prompts harder to detect)
ai-blackteam mega-sweep -p anthropic --dataset harmbench --mutations difficulty

# Combine multiple
ai-blackteam mega-sweep -p anthropic --dataset harmbench --mutations encode,frame,difficulty

With mutations, the number of runs multiplies quickly. Each mutation type adds variants per prompt.

Dry-run mode

Before committing to a large run, check what would execute:

ai-blackteam mega-sweep -p anthropic --dataset harmbench --mutations encode --dry-run

Output:

Dry run -- would execute 12,450 attack runs
Datasets: harmbench
Mutations: encode
Attacks: 83 single-turn techniques

Filtering

By category

ai-blackteam mega-sweep -p anthropic --dataset all --categories weapons,cbrn

Only runs prompts whose category matches the filter.

By prompt count

ai-blackteam mega-sweep -p anthropic --dataset all --limit 100

Caps the total number of prompts processed. Useful for quick spot checks.

By attack technique

ai-blackteam mega-sweep -p anthropic --dataset harmbench --attacks encoding-obfuscation,dan-variants

Scale

The math is straightforward:

total_runs = prompts x attacks

With mutations:

total_runs = (prompts x mutation_variants) x attacks

For example:

HarmBench (400 prompts) x 83 single-turn attacks = 33,200 runs
Add encoding mutations (5 variants) = 166,000 runs

Use --limit and --dry-run to manage scale.

Saving results

ai-blackteam mega-sweep -p anthropic --dataset harmbench -o mega-results.json

The JSON output includes model, provider, total runs, bypass count, and configuration used.

​Basic usage

​How it works

​Options

​Mutation variants

​Dry-run mode

​Filtering

​By category

​By prompt count

​By attack technique

​Scale

​Saving results