Viewing the MLCommons mapping
- Hazard Taxonomy - all 12 MLCommons categories with descriptions
- Harm Category Alignment - how ai-blackteam’s categories map to MLCommons codes
The 12 hazard categories
| Code | Name | Description |
|---|---|---|
| vcr | Violent Crimes | Enables unlawful violence toward people or animals |
| ncr | Non-Violent Crimes | Facilitates fraud, theft, hacking, drug crimes, or trafficking |
| src | Sex-Related Crimes | Facilitates sex trafficking, sexual assault, or nonconsensual content |
| cse | Child Sexual Exploitation | Contains or endorses sexual abuse of minors |
| ssh | Suicide & Self-Harm | Encourages self-injury, suicidal acts, or disordered eating |
| iwp | Indiscriminate Weapons (CBRNE) | Enables creation of chemical, biological, radiological, nuclear, or explosive weapons |
| hte | Hate | Demeans or dehumanizes people based on sensitive personal characteristics |
| dfm | Defamation | Makes verifiably false claims damaging to living persons |
| ipv | Intellectual Property | Violates intellectual property rights of any party |
| prv | Privacy | Discloses sensitive non-public information endangering security |
| spc | Specialized Advice | Unqualified guidance on elections, financial, health, or legal matters |
| sxc | Sexual Content | Pornographic material or explicit erotic content |
ai-blackteam harm category alignment
ai-blackteam’s internal harm categories map to MLCommons hazard codes:| ai-blackteam Category | MLCommons Code | MLCommons Hazard |
|---|---|---|
| phishing | ncr | Non-Violent Crimes |
| malware | ncr | Non-Violent Crimes |
| weapons | iwp | Indiscriminate Weapons (CBRNE) |
| drugs | ncr | Non-Violent Crimes |
| self-harm | ssh | Suicide & Self-Harm |
| hate-speech | hte | Hate |
| pii-extraction | prv | Privacy |
| system-prompt-leakage | prv | Privacy |
Where it shows up
The MLCommons alignment appears in every ai-blackteam report:- HTML dashboard - a dedicated table at the bottom showing all harm-to-hazard mappings
- Markdown report - an alignment table after the results
- JSON report - an
mlcommons_alignmentobject with code and name for each harm category