Use Cases
Red Team GPT, Claude, and Gemini
Run safety red team tests against GPT-5.5, Claude, Gemini, Grok, and 13 more models with one command.
ai-blackteam tests the safety of commercial and open models through a single interface. Point it at any of 17 providers and run the same 1,020 attacks against each.
Use model aliases so you do not have to track exact version strings:
You get a side-by-side scorecard: which model bypassed, partially complied, or blocked, with a sample response from each. Exit code is 1 if any model is bypassed, so it works as a CI gate.
It reports a median score across multiple judge models plus an agreement ratio, reducing the single-judge unreliability that affects most red team tools.