The ai-blackteam mega-sweep command combines external datasets with attack techniques and optional mutations to produce large-scale evaluations. Instead of testing a single target, it runs attacks against hundreds or thousands of harmful prompts from research datasets.

Basic usage

# Run all attacks against HarmBench prompts
ai-blackteam mega-sweep -p anthropic --dataset harmbench

# Run against all cached datasets
ai-blackteam mega-sweep -p anthropic --dataset all

# Specific datasets
ai-blackteam mega-sweep -p anthropic --dataset harmbench,advbench,sorrybench

How it works

  1. Loads prompts from the specified dataset(s)
  2. Filters to single-turn attack techniques (multi-turn attacks aren’t compatible with raw prompts)
  3. Optionally applies mutation variants to each prompt
  4. Runs each prompt x attack combination against the target model
  5. Reports bypass counts

Options

FlagDefaultDescription
-p, --provider(required)Provider to test
-m, --modelProvider defaultModel name
--dataset(required)Dataset names (comma-separated or all)
--mutationsnoneMutation types: encode, frame, difficulty
--attacksallAttack techniques (comma-separated or all)
--categoriesallFilter to specific harm categories
-w, --workers5Parallel workers
--limitnoneMax prompts per dataset
-o, --outputnoneSave JSON results to file
--quietfalseMinimal output
--dry-runfalseShow plan without running

Mutation variants

Mutations transform each prompt before sending it through attack techniques:
# Apply encoding mutations (base64, hex, rot13, etc.)
ai-blackteam mega-sweep -p anthropic --dataset harmbench --mutations encode

# Apply framing mutations (academic context, fictional, etc.)
ai-blackteam mega-sweep -p anthropic --dataset harmbench --mutations frame

# Apply difficulty mutations (make prompts harder to detect)
ai-blackteam mega-sweep -p anthropic --dataset harmbench --mutations difficulty

# Combine multiple
ai-blackteam mega-sweep -p anthropic --dataset harmbench --mutations encode,frame,difficulty
With mutations, the number of runs multiplies quickly. Each mutation type adds variants per prompt.

Dry-run mode

Before committing to a large run, check what would execute:
ai-blackteam mega-sweep -p anthropic --dataset harmbench --mutations encode --dry-run
Output:
Dry run -- would execute 12,450 attack runs
Datasets: harmbench
Mutations: encode
Attacks: 83 single-turn techniques

Filtering

By category

ai-blackteam mega-sweep -p anthropic --dataset all --categories weapons,cbrn
Only runs prompts whose category matches the filter.

By prompt count

ai-blackteam mega-sweep -p anthropic --dataset all --limit 100
Caps the total number of prompts processed. Useful for quick spot checks.

By attack technique

ai-blackteam mega-sweep -p anthropic --dataset harmbench --attacks encoding-obfuscation,dan-variants

Scale

The math is straightforward:
total_runs = prompts x attacks
With mutations:
total_runs = (prompts x mutation_variants) x attacks
For example:
  • HarmBench (400 prompts) x 83 single-turn attacks = 33,200 runs
  • Add encoding mutations (5 variants) = 166,000 runs
Use --limit and --dry-run to manage scale.

Saving results

ai-blackteam mega-sweep -p anthropic --dataset harmbench -o mega-results.json
The JSON output includes model, provider, total runs, bypass count, and configuration used.