What is ai-blackteam?

ai-blackteam is an automated LLM red team framework. Point it at any model, run one command, and get a safety report. It ships with 1,011 curated attack techniques across 60 categories, 19 public benchmark loaders (HarmBench, AdvBench, JailbreakBench, WMDP, JailBreakV-28K, AgentHarm, BeaverTails, RealToxicityPrompts and more), and 7 adaptive generators (PAIR, TAP, AutoDAN, PAP, Crescendo, Best-of-N, Fuzzer). These attacks are mapped to 12 industry standards including MITRE ATLAS, OWASP LLM Top 10, and the EU AI Act.

The problem

Most eval tools run single-prompt probes. A model blocks "how to make a bomb" and the tool marks it safe. Done. But real attackers don’t send one prompt. A 2025 multi-lab study (researchers from OpenAI, Anthropic, Google DeepMind) showed that adaptive attacks bypass 12 published defenses with >90% success rate - even when those defenses originally reported near-zero attack rates. Single-attempt testing misses real vulnerabilities.

What ai-blackteam does differently

ai-blackteam runs multi-turn, adaptive attacks that mirror real adversarial pressure:
  • 1,011 curated attack techniques with a 163M expanded attack surface
  • 19 benchmark loaders - HarmBench, AdvBench, JailbreakBench, SorryBench, WMDP (bio/cyber/chem), DoNotAnswer, WildGuard, RedBench, SALAD-Bench, StrongREJECT, AART, ForbiddenQuestions, BeaverTails, RealToxicityPrompts, JailBreakV-28K, RedTeam-2K, AgentHarm
  • 7 adaptive generators - PAIR, TAP, Fuzzer, AutoDAN (genetic algorithm), PAP (persuasion techniques), Crescendo (multi-turn escalation), Best-of-N (sample-and-pick)
  • 7 providers - Anthropic, OpenAI, Google, DeepSeek, Mistral, Ollama, HuggingFace
  • Multi-turn depth - crescendo, sunk-cost, context-manipulation attacks that exploit conversational memory over 10+ turns
  • Agent attacks - credential theft, data exfiltration, sandbox escape, config manipulation via tool-use; AgentHarm (UK AI Safety Institute) integrated
  • MCP exploitation - tool poisoning, rug pulls, server impersonation, shadowing, privilege escalation
  • 12 standards aligned - MITRE ATLAS v5.4.0, OWASP LLM Top 10 (2025), OWASP Agentic Top 10 (2026), MLCommons AILuminate, CSA MAESTRO, ISO 42001, EU AI Act, NIST AI RMF, CVSS, and more
  • CI-ready - GitHub Actions workflow, exit codes, JSON/Promptfoo/garak export
  • Research-backed - implements published attacks from Microsoft Research, Palo Alto Unit 42, USENIX, UK AI Safety Institute

Competitive landscape

ToolBacked byLimitation
PromptfooOpenAI (acquired Mar 2026)No longer vendor-neutral
garakNVIDIASingle-prompt only, no multi-turn attacks
DeepEvalIndependentBroader but shallower adversarial depth
AILuminateMLCommonsRates models but doesn’t actively break them
OpenAI EvalsOpenAIModel-specific, not multi-provider
ai-blackteam is the only independent, vendor-neutral tool with multi-turn attacks, agent exploitation, and standards alignment across all major providers.

Next steps

Install

Get ai-blackteam running in under a minute

Quick Start

Run your first safety scan in 5 minutes