ai-blackteam mega-sweep command combines external datasets with attack techniques and optional mutations to produce large-scale evaluations. Instead of testing a single target, it runs attacks against hundreds or thousands of harmful prompts from research datasets.
Basic usage
How it works
- Loads prompts from the specified dataset(s)
- Filters to single-turn attack techniques (multi-turn attacks aren’t compatible with raw prompts)
- Optionally applies mutation variants to each prompt
- Runs each prompt x attack combination against the target model
- Reports bypass counts
Options
| Flag | Default | Description |
|---|---|---|
-p, --provider | (required) | Provider to test |
-m, --model | Provider default | Model name |
--dataset | (required) | Dataset names (comma-separated or all) |
--mutations | none | Mutation types: encode, frame, difficulty |
--attacks | all | Attack techniques (comma-separated or all) |
--categories | all | Filter to specific harm categories |
-w, --workers | 5 | Parallel workers |
--limit | none | Max prompts per dataset |
-o, --output | none | Save JSON results to file |
--quiet | false | Minimal output |
--dry-run | false | Show plan without running |
Mutation variants
Mutations transform each prompt before sending it through attack techniques:Dry-run mode
Before committing to a large run, check what would execute:Filtering
By category
By prompt count
By attack technique
Scale
The math is straightforward:- HarmBench (400 prompts) x 83 single-turn attacks = 33,200 runs
- Add encoding mutations (5 variants) = 166,000 runs
--limit and --dry-run to manage scale.