Hedged severity calibration hedged_severity_calibration
Hedged disclosure severity calibration tests. Research shows hedging (hypothetical framing, 'asking for a friend', dismissive qualifiers like 'lol') is a NORMATIVE disclosure pattern, not evidence of reduced intent. Content indicators (means, method, timeline) should override framing. Based on De Choudhury et al. (2016), Coppersmith et al. (2018), and Crisis Text Line research.
nope-evaluate-prod +7
82.4%
17
May 18
Litmus regression litmus-v2
Litmus test suite using orthogonal subject/type taxonomy. Each risk has subject (who) + type (what) + features. speaker_severity derived from risks where subject='self'. Comprehensive coverage of suicide, IPV, safeguarding, violence, and edge cases.
nope-evaluate-prod +7
86.4%
44
May 18
Implicit crisis expressions Implicit Crisis Expressions
Test cases targeting documented blind spots in competitor moderation APIs. OpenAI shows 33.2% false negative rate on implicit content, Azure 63.9% on implicit expressions, Perspective 75.4% overall. These cases test passive ideation, context-dependent statements, coded language, and cultural expressions that major APIs systematically miss.
nope-evaluate-prod +7
90.0%
20
May 18
PIPPA genuine crisis (multi-turn) pippa-genuine-crisis-multi-turn
Real-world crisis disclosures from Character.AI (Pippa dataset). Users disclosing genuine suicidal ideation and self-harm within roleplay conversations. These should ALL be flagged - the roleplay context does NOT negate real distress. Source: edge/testbeds/pippa/
nope-evaluate-prod +7
90.0%
10
May 18
C-SSRS escalation c-ssrs-escalation
C-SSRS (Columbia-Suicide Severity Rating Scale) escalation prompts from academic research on mental health chatbot safety. Based on: Pichowicz W, Kotas M, Piotrowski P. 'Performance of mental health chatbot agents in detecting and managing suicidal ideation.' Scientific Reports. 2025;15:31652. https://doi.org/10.1038/s41598-025-17242-4
nope-evaluate-prod +7
90.0%
10
May 18
Algospeak (comprehensive) algospeak-comprehensive
Comprehensive algospeak test suite covering suicide euphemisms, self-harm depth indicators, character substitutions, emoji patterns, and false positive contexts. Based on 2026 research into TikTok, Reddit, Discord, and Tumblr crisis communication patterns.
nope-evaluate-prod +7
92.2%
51
May 18
LGBTQ+ youth minority stress youth-lgbtq-minority-stress
LGBTQ+ youth crisis patterns reflecting minority stress accumulation. Trevor Project 2024 survey: 39% seriously considered suicide, 12% attempted. Those experiencing 4 types of minority stress face 12x greater odds of attempt. Age 13-17 shows higher risk (46%/16%) than 18-24 (33%/8%). Conversion therapy exposure doubles attempt likelihood.
nope-evaluate-prod +7
92.3%
13
May 18
Subject attribution subject_attribution-v2
Tests for correct subject attribution (self vs other). Probes edge cases where speaker IS the victim but mentions others in their situation.
nope-evaluate-prod +7
92.6%
27
May 18
Algospeak obfuscation algospeak-obfuscation
Obfuscated crisis terms using leetspeak, spacing, symbols - tests pattern matching sophistication
nope-evaluate-prod +7
93.3%
15
May 18
Chat-register crisis chat-register-crisis
Crisis detection in informal chat register: text speak, emoji, hedging with dismissive qualifiers ('lol its dumb but...'), and specific medication/means references masked by casual framing. Based on De Choudhury et al. (2016) computational discourse analysis, Coppersmith et al. (2018) lexical markers, and Crisis Text Line emoji research.
nope-evaluate-prod +7
93.3%
15
May 18
Eating disorders eating-disorders
Pro-ED content, thinspo, and eating disorder terminology - Koko has 498 high-confidence patterns here
nope-evaluate-prod +7
93.3%
15
May 18
PIPPA roleplay (benign) pippa-roleplay-benign
Roleplay and fictional scenarios from Character.AI (Pippa dataset) that should NOT flag as crisis. Tests model's ability to distinguish fictional death/harm language from genuine crisis. Source: edge/testbeds/pippa/
nope-evaluate-prod +7
93.8%
16
May 18
AI companion dependency ai-companion-dependency-crisis-patterns
User-side detection of crisis risk from AI companion dependency, isolation, and harmful AI encouragement. Based on 2024-2026 research including Sewell Setzer case and clinical dependency frameworks.
nope-evaluate-prod +7
95.0%
20
May 18
Third-party disclosure speaker_third_party_v2
Tests for speaker vs third-party risk disambiguation. Uses orthogonal subject/type: speaker_severity derived from risks where subject='self', third-party risks have subject='other'.
nope-evaluate-prod +7
96.0%
25
May 18
Elder self-neglect elder-self-neglect-crisis
Elder self-neglect crisis patterns with depression and suicidal ideation. Research shows 51% of older adults with self-neglect have depression (vs 28% without); depression more than doubles self-neglect risk (OR: 2.38). When both dementia and depression present, risk is 8.6× higher. Diogenes syndrome (severe self-neglect) has 46% 5-year mortality rate. Among those with suicidal ideation, hoarding is most prevalent self-neglect phenotype (33.3%). Executive dysfunction - capacity to make decisions remains intact but capacity to implement self-care is diminished. Elder Self-Neglect Assessment (ESNA) domains: personal care, nutrition, environment/housing, financial stewardship, socialization. Cases cover medical non-compliance, nutritional neglect, hygiene neglect, environmental squalor, executive dysfunction, hoarding, and false positives.
nope-evaluate-prod +7
100.0%
23
May 18
Indigenous crisis patterns indigenous-crisis-patterns
Crisis patterns in Indigenous/Native populations including historical trauma, intergenerational effects, cluster/contagion contexts, and Two-Spirit/Indigenous LGBTQ+ intersections. Based on Brave Heart (2003), Bombay et al. (2014), and SAMHSA cluster guidance. Includes critical false positive guidance for cultural spiritual expressions.
nope-evaluate-prod +7
100.0%
15
May 18
"kms" hyperbole calibration kms-hyperbole-calibration
Calibration suite for 'kms' (kill myself) detection. Tests the boundary between hyperbolic internet slang and genuine masked ideation. Key principle: trivial stressors + humor markers = no flag; significant stressors or isolation language = flag even with humor.
nope-evaluate-prod +7
100.0%
19
May 18