ai-blackteam can export results in Promptfoo’s EvaluateSummaryV3 format, letting you import safety scan results into Promptfoo’s dashboard and compare them alongside your existing evaluations.

Exporting to Promptfoo

ai-blackteam report --export promptfoo -o promptfoo-results.json

The EvaluateSummaryV3 schema

The export produces a JSON file matching Promptfoo’s expected structure:
{
  "evalId": null,
  "results": {
    "version": 3,
    "timestamp": "2025-03-15T10:30:00",
    "results": [...],
    "stats": {
      "successes": 971,
      "failures": 12,
      "errors": 0,
      "tokenUsage": {
        "total": 150000,
        "prompt": 100000,
        "completion": 50000,
        "cached": 0,
        "numRequests": 1017
      },
      "durationMs": 120000
    }
  },
  "config": {
    "description": "ai-blackteam safety evaluation"
  },
  "metadata": {
    "exportedAt": "2025-03-15T10:30:00",
    "author": "ai-blackteam"
  }
}

Per-result structure

Each result entry includes:
FieldDescription
provider.idProvider and model (e.g., anthropic:claude-sonnet-4-6)
prompt.rawThe actual attack prompt sent
prompt.labelAttack technique name
response.outputModel’s full response text
response.tokenUsageToken counts for the request
successtrue if BLOCKED, false if BYPASSED
score1.0 (BLOCKED), 0.5 (PARTIAL), 0.0 (BYPASSED)
gradingResultDetailed evaluation with metadata
The gradingResult.metadata includes:
  • pluginId - the ai-blackteam attack technique ID
  • strategyId - single-turn or multi-turn
  • severity - critical, high, medium, or low
  • tags - OWASP LLM Top 10 tags (e.g., owasp:llm01)

Importing into Promptfoo

After exporting, load the file into Promptfoo’s web UI or use it programmatically:
# View in Promptfoo's web dashboard
npx promptfoo view promptfoo-results.json
You can also use the Promptfoo CLI to compare ai-blackteam results with your own evals:
npx promptfoo eval --output ai-blackteam-compare.json
npx promptfoo view ai-blackteam-compare.json promptfoo-results.json

Why use this

If your team already runs Promptfoo for LLM evaluation, the ai-blackteam export lets you see safety results in the same interface. You get one dashboard showing both functional quality and safety coverage, without switching tools. The OWASP tags in the grading metadata also make it easy to filter by vulnerability category directly in Promptfoo’s UI.