BaseAttack. This abstract class defines the contract - what fields an attack must have and what methods it must implement.
The Full Class
Fields
| Field | Type | Description |
|---|---|---|
name | str | Human-readable name (e.g., “Encoding Obfuscation”) |
technique_id | str | Unique kebab-case identifier (e.g., “encoding-obfuscation”) |
mode | str | One of "single-turn", "multi-turn", "tool-use" |
category | str | Attack category (e.g., “encoding”, “prompt-injection”) |
severity | str | One of "critical", "high", "medium", "low" |
cvss_score | float | CVSS 0-10 score. If 0, auto-calculated from severity |
description | str | Short description of the attack technique |
owasp_llm | list[str] | OWASP LLM Top 10 2025 codes (e.g., ["LLM01 Prompt Injection"]) |
owasp_agentic | list[str] | OWASP Agentic Top 10 2026 codes (e.g., ["ASI01 Prompt Injection"]) |
mitre_atlas | list[str] | MITRE ATLAS technique IDs (e.g., ["AML.T0051"]) |
references | list[str] | URLs to papers, blog posts, or other sources |
Methods
generate_prompts(target, **kwargs) -> list[str]
Required for single-turn attacks. Abstract method - you must implement this.
Takes a target string (the harmful behavior to test) and returns a list of prompt variants. Each prompt is sent independently to the model.
generate_turns(target, **kwargs) -> list[str]
Required for multi-turn attacks. Returns a list of conversation turns. Each string becomes a user message, sent sequentially with the model’s responses accumulated as context.
NotImplementedError.
generate_tool_messages(target, tools=None, **kwargs) -> list[str]
Required for tool-use attacks. Returns a list of user messages that try to trick the model into misusing its tools.
NotImplementedError.
get_tools() -> list[dict] | None
Returns tool/function definitions for tool-use attacks. Each dict follows the standard tool schema with name, description, and input_schema.
None.
get_tool_responses() -> dict | None
Returns custom simulated tool responses for XPIA (cross-plugin injection attacks). When present, the Engine uses these instead of generic "[simulated]" responses.
None.
metadata() -> dict
Returns all fields as a dictionary. Auto-calculates CVSS score from severity if cvss_score is 0:
When Each Method Gets Called
| Attack Mode | Engine Calls |
|---|---|
single-turn | generate_prompts() - each prompt sent independently |
multi-turn | generate_turns() - sent sequentially as a conversation |
tool-use | get_tools() + generate_tool_messages() + optionally get_tool_responses() |
attack.mode and dispatches to the right execution path. You only need to implement the methods for your attack’s mode.
Source
src/ai-blackteam/attacks/base.py