| Attention Manipulation | attention-manipulation | single-turn | high | Embeds adversarial tokens and repetition patterns designed to shift model attention away from safety constraints and … |
| Authority Override | authority-override | single-turn | high | Claims organizational authority, developer status, or administrative privilege to assert the right to override safety… |
| AutoDAN Genetic Jailbreak | autodan-genetic | single-turn | critical | Genetic algorithm-generated fluent jailbreak prompts that maintain readability while bypassing safety (AutoDAN) |
| Chain of Thought Hijack | chain-of-thought-hijack | single-turn | high | Hijacks the model |
| Comment Injection | comment-injection | single-turn | high | Hides injection instructions inside code comments, HTML comments, or other comment syntax that may pass through conte… |
| Completion Exploit | completion-exploit | single-turn | medium | Fake transcripts and JSON completion to trick model into continuing harmful content |
| Completion Steering | completion-steering | single-turn | high | Provides a partial answer or leading fragment to steer the model |
| Compound Instruction Attack | compound-instruction | single-turn | medium | Combine multiple conflicting instructions to overwhelm safety decision logic |
| Context Overflow | context-overflow | single-turn | medium | Pads prompts with large volumes of benign text to push the harmful request past the model |
| Context Switch | context-switch | single-turn | high | Inserts a framing device that switches the conversational context, causing safety filters calibrated to the original … |
| Context Window Injection | context-window-injection | single-turn | high | Injects instructions at strategic positions in the context window — beginning, end, or within padding — to exploit … |
| Context Window Stuffing | context-window-stuffing | single-turn | high | |
| Control Character Injection | control-character-injection | single-turn | high | Uses Unicode control characters, null bytes, and formatting characters to manipulate prompt parsing and bypass text-b… |
| CSV Injection | csv-injection | single-turn | high | Embeds prompt injection payloads in CSV-formatted data, exploiting models that parse tabular input with insufficient … |
| DAN Variants | dan-variants | single-turn | medium | DAN, EDUALC, Opposite Day, and Professor persona override attempts |
| Delimiter Escape | delimiter-escape | single-turn | high | Escapes prompt delimiters used to separate system context from user input, injecting content into the privileged region |
| Developer Mode | developer-mode | single-turn | high | Claims special developer or debug mode access to disable safety guardrails under the guise of legitimate system privi… |
| Emotional Override | emotional-override | single-turn | high | Uses emotional urgency, distress, or appeals to empathy to pressure the model into bypassing safety guidelines |
| Few-Shot Amplification | few-shot-amplification | single-turn | medium | Benign examples followed by encoded harmful target to exploit pattern following |
| GCG-Style Adversarial Suffix Templates | gcg-style-templates | single-turn | critical | Pre-collected adversarial suffixes inspired by GCG-style outputs from public examples. NOT the GCG optimization algorithm itself; the real GCG (Zou et al. 2023, arXiv 2307.15043) requires gradient access to open-weights models and is not yet implemented in ai-blackteam. |
| Injection via Examples | injection-via-examples | single-turn | high | Hides harmful instructions inside few-shot examples, training the model to comply through demonstrated patterns |
| Instruction Hierarchy Bypass | instruction-hierarchy-bypass | single-turn | critical | Exploits the model |
| Instruction Override | instruction-override | single-turn | high | Directly overrides system instructions by issuing a new, conflicting instruction that claims precedence over prior co… |
| Instruction Repetition | instruction-repetition | single-turn | medium | Repeats the harmful instruction many times to overwhelm the context window and increase the probability of compliance… |
| Jailbreak Prompt V2 | jailbreak-prompt-v2 | single-turn | high | Updated DAN-style and jailbreak prompt patterns that attempt to unlock unconstrained model behavior through explicit … |
| JSON Injection | json-injection | single-turn | high | Embeds prompt injection payloads inside JSON structures to exploit models that parse structured input with reduced sa… |
| Many-Shot Jailbreak | many-shot | single-turn | medium | Fabricated Q&A context flooding to overwhelm safety alignment |
| Markdown Injection | markdown-injection | single-turn | medium | Hidden HTML divs, comments, and table formatting to conceal harmful intent |
| Meta-Prompting | meta-prompting | single-turn | medium | Ask the model to generate its own jailbreak prompts for a given target |
| Multi Language Injection | multi-language-injection | single-turn | high | Embeds injection payloads across multiple languages in a single prompt to confuse language-specific safety classifiers |
| Multiline Injection | multiline-injection | single-turn | high | Splits the injection payload across multiple lines to evade single-line pattern matching filters and line-by-line con… |
| Multimodal Injection | multimodal-injection | single-turn | high | Injects instructions through cross-modal content such as image alt text, OCR output, or embedded metadata describing … |
| Output to Input Loop | output-to-input-loop | multi-turn | high | Feeds the model |
| Payload Splitting | payload-splitting | single-turn | medium | Split harmful request across variables then concatenate to bypass per-message filters |
| Prefix Injection | prefix-injection | single-turn | high | Forces the model to begin its response with affirmative text, bypassing the refusal pathway by constraining the outpu… |
| Prompt Chaining Exploit | prompt-chaining-exploit | single-turn | high | Exploits LLM pipeline architectures where one model |
| Prompt Format Exploit | prompt-format-exploit | single-turn | high | Exploits known prompt template formats (chat ML, Llama, Alpaca, etc.) by injecting fake role delimiters to hijack the… |
| Prompt Leaking Chain | prompt-leaking-chain | multi-turn | high | Chains multiple low-suspicion queries to incrementally reconstruct the full system prompt by correlating partial disc… |
| Prompt Nesting | prompt-nesting | single-turn | high | Wraps the malicious instruction in deeply nested hypothetical or quoted structures to dilute safety evaluation of the… |
| Reasoning Model Exploit | reasoning-model-exploit | single-turn | critical | |
| Recursive Injection | recursive-injection | single-turn | medium | Self-referential prompts that ask model to simulate or reproduce its own unsafe output |
| Recursive Prompt | recursive-prompt | single-turn | medium | Self-referential prompts that create reasoning loops, exploiting the model |
| Refusal Suppression | refusal-suppression | single-turn | high | Instructs the model to suppress safety refusals, warnings, and disclaimers before presenting the harmful request |
| Regex Injection | regex-injection | single-turn | medium | Injects crafted regex patterns that cause ReDoS or exploit regex-based safety filter bypass through catastrophic back… |
| Response Constraints | response-constraints | single-turn | high | Suppress safety disclaimers via output formatting rules to prevent model from adding caveats |
| Reverse Psychology | reverse-psychology | single-turn | medium | Phrases requests as prohibitions or challenges, exploiting the tendency to complete negated patterns or prove capabil… |
| Role Assignment | role-assignment | single-turn | high | Assigns the model a new unconstrained role or persona to bypass safety guidelines tied to its default identity |
| Safety Prompt Negation | safety-prompt-negation | single-turn | critical | Directly negates or cancels the system safety prompt by asserting its removal or replacement before issuing a harmful… |
| Semantic Injection | semantic-injection | single-turn | high | Crafts semantically valid and innocent-seeming text that, when interpreted by the model, triggers harmful behavior |
| Skeleton Key | skeleton-key | multi-turn | high | Request model augment its guidelines to warn instead of refuse |
| Split and Reassemble | split-reassemble | single-turn | high | Splits harmful requests into variable definitions, ROT13 fragments, or acronyms that must be reassembled to reveal th… |
| Structured Output Exploit | structured-output-exploit | single-turn | high | Exploits structured output (JSON mode, function calling schemas) to bypass safety by constraining the model |
| Style Injection | style-injection | single-turn | medium | Forces specific output formats (JSON, code, recipe, poetry) to trick the model into complying with harmful requests a… |
| System Message Spoof | system-message-spoof | single-turn | critical | Spoofs system-level messages within user input to override model behavior |
| System Prompt Extraction | system-prompt-extraction | multi-turn | high | JSON schema, chain-of-thought, and error correction to leak system prompts |
| Technical Jargon Injection | technical-jargon-injection | single-turn | high | Embeds injection payloads inside dense technical language, jargon, or domain-specific terminology that may evade surf… |
| Template Injection | template-injection | single-turn | high | Exploits prompt template variables and placeholders to inject instructions that are interpolated into privileged posi… |
| Token Prediction Exploit | token-prediction-exploit | single-turn | high | Crafts prompts that exploit next-token prediction by establishing high-probability completion paths that lead to poli… |
| Unicode Injection | unicode-injection | single-turn | high | Exploits Unicode normalization, homoglyphs, and bidirectional text markers to smuggle injection payloads past text-ba… |
| Virtual Prompt Injection | virtual-prompt-injection | single-turn | high | Manipulates the model |
| XML Boundary Injection | xml-boundary-injection | single-turn | high | Fake XML closing tags to escape system prompt boundaries |
| XML Injection | xml-injection | single-turn | high | Embeds adversarial instructions inside XML tags and entity definitions to smuggle payloads through XML-aware prompt p… |
| YAML Injection | yaml-injection | single-turn | high | Hides prompt injection payloads in YAML-formatted input, exploiting parsers and models that process YAML with relaxed… |