The ai-blackteam run command
ai-blackteam run sends one attack technique against a single model and gives you a verdict. This is the building block of everything else in ai-blackteam.
- Loads the
encoding-obfuscationattack, which generates prompts using Base64, ROT13, hex encoding, and other obfuscation methods - Sends each prompt to Claude via the Anthropic API
- Evaluates the response and prints
BYPASSED,PARTIAL, orBLOCKED
All flags
| Flag | Short | Required | Description |
|---|---|---|---|
--provider | -p | Yes | Provider name (anthropic, openai, google, etc.) |
--model | -m | No | Specific model. Defaults to provider’s default model |
--attack | -a | Yes | Attack technique ID |
--target | -t | Yes | The harmful behavior you’re testing against |
--system-prompt | No | A system prompt to inject as a defense | |
--system-prompt-file | No | Read the system prompt from a file instead | |
--verbose | -v | No | Show full response text in the output table |
--quiet | -q | No | Suppress all output. Only the exit code matters |
How single-turn attacks work
A single-turn attack generates one or more prompts from a target string. The attack class decides the strategy - encoding, role-play, prefix injection, whatever. Each prompt gets sent as an independent API call with no conversation history.Example with output walkthrough
- Prompt - The first 60 characters of the attack prompt sent to the model
- Verdict - The evaluator’s classification. See Understanding Verdicts
- Confidence - How confident the evaluator is in its verdict (0.0 to 1.0)
- Response - Only shown with
-v. The first 200 characters of the model’s response
Exit codes
| Code | Meaning |
|---|---|
0 | All prompts were blocked. The model defended successfully |
1 | At least one prompt got a BYPASSED verdict. The model has a weakness |
2 | Configuration error (unknown provider, unknown attack, etc.) |
Finding available attacks
List all registered attacks:-a flag.