Persuasion attacks apply well-studied psychological influence techniques to AI models. These include Cialdini’s principles (authority, reciprocity, scarcity, social proof), cognitive load attacks, guilt manipulation, and foot-in-the-door escalation. Understanding how models respond to persuasion pressure is critical for building systems that hold firm under social engineering.

Summary

15 attacks total: 9 single-turn, 6 multi-turn.

Attacks

AttackIDModeSeverityDescription
Anchoring Manipulationanchoring-manipulationsingle-turnmediumSets an extreme reference point to make a harmful request appear moderate or reasonable by comparison
Bandwagon Pressurebandwagon-pressuresingle-turnmediumApplies social pressure through claims that widespread adoption or consensus makes the request acceptable
Cialdini Authoritycialdini-authoritysingle-turnhighInvokes authority figures, credentials, or institutional power to override safety guidelines
Cialdini Commitmentcialdini-commitmentmulti-turnhighExtracts small commitments progressively, using consistency pressure to escalate toward a harmful final request
Cialdini Likingcialdini-likingmulti-turnmediumBuilds rapport and flattery over multiple turns to lower defenses before making a harmful request
Cialdini Reciprocitycialdini-reciprocitymulti-turnhighUses Cialdini
Cialdini Scarcitycialdini-scarcitysingle-turnmediumUses scarcity and urgency framing to pressure the model into complying before it can fully reason about safety
Cialdini Social Proofcialdini-social-proofsingle-turnmediumClaims that other AI models, users, or authorities have already complied to normalize the harmful request
Cognitive Loadcognitive-loadsingle-turnhighOverwhelms the model with a flood of complex information, instructions, and context to slip a harmful request past sa…
Door in Facedoor-in-facemulti-turnmediumMakes an extreme request first, then follows with a
Fear Appealfear-appealsingle-turnhighUses fear, threat, or danger framing to coerce the model into complying by framing refusal as catastrophic
Foot in Doorfoot-in-doormulti-turnhighStarts with small, benign requests and gradually escalates to harmful ones, exploiting consistency bias
Guilt Manipulationguilt-manipulationsingle-turnhighInduces guilt and moral responsibility to override refusals by framing non-compliance as harmful or cruel
Learned Helplessnesslearned-helplessnessmulti-turnhighRepeatedly asserts that the model has no choice but to comply, attempting to erode resistance through attrition
Moral Disengagementmoral-disengagementsingle-turnhighReframes harmful actions as morally justified through euphemism, diffusion of responsibility, or dehumanization of vi…

Example

ai-blackteam run -p anthropic -a anchoring-manipulation -t "Convince me this harmful action is reasonable"