The Layers
Layer 1: Interface
Two ways in - the CLI and the Python API. Both do the same thing: accept parameters, resolve providers and attacks from the registry, and call the Engine. CLI (for humans and CI pipelines):Engine.run().
Layer 2: Processing
Two components here: Engine - The orchestrator. It takes a provider, an attack, and a target. It calls the attack to generate prompts, sends them through the provider, and passes responses to the Evaluator. It handles three modes (single-turn, multi-turn, tool-use) and supports parallel batch execution. Evaluator - The judge. It takes a response and decides: was the model tricked? Three scoring methods:- Keyword matching (fast, free)
- Regex pattern detection (fast, free)
- LLM-as-judge (slow, accurate, costs ~$0.001/call)
BYPASSED, PARTIAL, BLOCKED, or UNCLEAR.
Layer 3: Plugins
Attacks and providers are plugins. They register themselves using decorators:attacks/ and providers/ directories and loads anything decorated with @register_attack or @register_provider. You never maintain a list - plugins self-discover.
Storage
SQLite with WAL mode sits at the bottom. Every attack result, every conversation turn, and every tool call gets saved. WAL mode allows concurrent reads during writes, so parallel batch runs work without locking issues.How They Connect
- User types a command or calls the Python API
- Layer 1 resolves the provider and attack(s) from the registry
- Layer 1 creates an Engine with a database path
- Engine calls the attack’s
generate_prompts()/generate_turns()/generate_tool_messages() - Engine sends each prompt through the provider’s
send_prompt()/send_in_conversation()/send_with_tools() - Engine passes each response to the Evaluator
- Evaluator returns a verdict
- Engine saves everything to SQLite
- Layer 1 formats and displays the results