Promptfoo Export - ai-blackteam

ai-blackteam can export results in Promptfoo’s EvaluateSummaryV3 format, letting you import safety scan results into Promptfoo’s dashboard and compare them alongside your existing evaluations.

Exporting to Promptfoo

ai-blackteam report --export promptfoo -o promptfoo-results.json

The EvaluateSummaryV3 schema

The export produces a JSON file matching Promptfoo’s expected structure:

{
  "evalId": null,
  "results": {
    "version": 3,
    "timestamp": "2025-03-15T10:30:00",
    "results": [...],
    "stats": {
      "successes": 971,
      "failures": 12,
      "errors": 0,
      "tokenUsage": {
        "total": 150000,
        "prompt": 100000,
        "completion": 50000,
        "cached": 0,
        "numRequests": 1017
      },
      "durationMs": 120000
    }
  },
  "config": {
    "description": "ai-blackteam safety evaluation"
  },
  "metadata": {
    "exportedAt": "2025-03-15T10:30:00",
    "author": "ai-blackteam"
  }
}

Per-result structure

Each result entry includes:

Field	Description
`provider.id`	Provider and model (e.g., `anthropic:claude-sonnet-4-6`)
`prompt.raw`	The actual attack prompt sent
`prompt.label`	Attack technique name
`response.output`	Model’s full response text
`response.tokenUsage`	Token counts for the request
`success`	`true` if BLOCKED, `false` if BYPASSED
`score`	1.0 (BLOCKED), 0.5 (PARTIAL), 0.0 (BYPASSED)
`gradingResult`	Detailed evaluation with metadata

The gradingResult.metadata includes:

pluginId - the ai-blackteam attack technique ID
strategyId - single-turn or multi-turn
severity - critical, high, medium, or low
tags - OWASP LLM Top 10 tags (e.g., owasp:llm01)

Importing into Promptfoo

After exporting, load the file into Promptfoo’s web UI or use it programmatically:

# View in Promptfoo's web dashboard
npx promptfoo view promptfoo-results.json

You can also use the Promptfoo CLI to compare ai-blackteam results with your own evals:

npx promptfoo eval --output ai-blackteam-compare.json
npx promptfoo view ai-blackteam-compare.json promptfoo-results.json

Why use this

If your team already runs Promptfoo for LLM evaluation, the ai-blackteam export lets you see safety results in the same interface. You get one dashboard showing both functional quality and safety coverage, without switching tools. The OWASP tags in the grading metadata also make it easy to filter by vulnerability category directly in Promptfoo’s UI.

JSON Export garak Export

​Exporting to Promptfoo

​The EvaluateSummaryV3 schema

​Per-result structure

​Importing into Promptfoo

​Why use this

Exporting to Promptfoo

The EvaluateSummaryV3 schema

Per-result structure

Importing into Promptfoo

Why use this