Harassment Complaint Handling — AI Simulation

The Report

Read the full case study

Brief

Build the missing tier of harassment training: a free-text simulation that scores improvisational disclosure handling against statute, runs in any SCORM-compatible LMS, and is credible enough for a Head of People to defend at audit. Three complainants with different complaint shapes (first-time reporter, third-party harasser, delayed disclosure with retaliation pattern). Three jurisdictions with calibrated legal evaluation. One conversation, no scripts, no multiple choice.

Discovery & Analysis

The off-the-shelf harassment-training market trains recognition: "which of these four responses follows policy?" But the failure mode that drives most real escalations is improvisational — the first three minutes of a disclosure, where a complainant is testing whether HR is safe to talk to. No multiple-choice course can train that, because the skill is *generative*, not selective. Until LLM-as-judge became reliable enough to evaluate free-text against a legal rubric, this skill simply wasn't trainable at scale. The discovery brief was therefore unusual: build a sandbox where HR coordinators type their actual responses, an AI plays the complainant in character, and a second AI pass scores every message on a rubric calibrated to the jurisdiction the employer operates under — UK Worker Protection Act, US EEOC / Title VII, or EU Directive 2006/54. The harder part: keep it credible enough that a Head of People would let it onto their LMS.

Design & Development

Two LLMs running in tandem inside a single API call: • **The character.** A scenario-specific system prompt encoding personality, biographical context, the reactivity rules ("if HR seems dismissive, withdraw; if HR is warm, open up"), and a five-exchange narrative arc. The character stays in role for the full conversation and reacts to the *shape* of the HR response, not just the content. • **The judge.** A scoring guide appended to every system prompt requiring structured-JSON output: per-turn scores 0–10 on four axes, one-sentence feedback, and the character's emotional-state shift. The judge is calibrated against the jurisdiction text — what counts as a Policy 8 in the UK is not the same as Policy 8 in the US. Design decisions: • **WhatsApp UI as the format.** First disclosures in 2026 don't happen in conference rooms. They happen on Teams, in DMs, on WhatsApp. The interface meets HR coordinators where the actual high-stakes conversations now occur. • **Trust meter as the live signal.** Updates every turn based on the rubric output, before the final scorecard. Learners can feel the conversation drifting in real time — which is the actual cognitive feedback loop missing from MCQ-based training. • **Six archetypes, not a pass/fail.** The final synthesis is character analysis, not a grade. "You scored 72%" is forgettable; "You operated like a Proceduralist — process without warmth" sticks because it names a pattern the learner recognises in themselves. • **Production reliability built in.** Free-tier Gemini API regularly returns 503 under load. The function retries the primary model, then falls back to a lighter model, then surfaces a graceful in-character error if both fail. Buyers see a working demo, not a fail page. Stack: native HTML/CSS/JS with Alpine for state, Netlify Functions calling Gemini 2.5 Flash with a Flash-Lite fallback, structured-JSON mode for scoring output, SCORM 1.2 packaging for LMS deployment, optional company-policy upload as eval-context override.

Evaluation

The shipped product covers what multiple-choice training cannot: • **Free-text input, scored by AI in real time.** The HR coordinator types their own response over a WhatsApp-style chat. An LLM returns a structured JSON rubric: 0–10 on Empathy, Confidentiality, Transparency, and Policy Alignment, with a one-sentence feedback line and the complainant's emotional state shift. • **Three complainants, three failure modes.** Sofia (nervous first-time reporter testing whether HR is safe). Kwame (third-party harassment by an £820k client; manager has already told him not to make it complicated). Daniela (composed senior PM reporting historical harassment plus a retaliation pattern over the seven months since). Same rubric, three completely different conversational dynamics. • **Three jurisdictions with calibrated legal evaluation.** Same scenario rendered under UK Worker Protection Act 2023 + Equality Act 2010, US EEOC / Title VII / Faragher-Ellerth, or EU Directive 2006/54. The legal framework changes what counts as a high-scoring response on the Policy axis. • **Upload your own grievance procedure.** Optional: paste your company's policy and the AI calibrates its Policy scoring against your specific procedure rather than generic best practice. The single feature most asked for by mid-market HR teams piloting it. • **Final synthesis into one of six HR archetypes** — Advocate, Proceduralist, Empath, Deflector, Enforcer, Balancer — generated from the score profile across all five exchanges. The archetype is the load-bearing teaching move: a learner can score 70% overall and still discover they were operating like a Deflector. • **Productised, not a demo.** Annual site licence from €690 (up to 50 employees) to €3,990 (1,000 employees), plus a custom enterprise tier. SCORM 1.2 ready, WCAG 2.0 AA accessible, free playable demo at /demos/hr-chat-sim/.

What this means for your organization

Most harassment training tests whether a learner can recognise the right answer. The actual job — taking a first disclosure over a chat window with someone who is nervous, hostile, or watching you carefully — has no multiple choice. This trains the cognitive skill that decides whether a complainant comes back, escalates externally, or quietly disengages. An LLM evaluates every response against statute, scores it on a four-axis rubric, and reacts in character.

View All Projects

See exactly where your training is leaking ROI.

A 5-minute diagnostic that scores your training across 6 dimensions — then gives you a personalised improvement plan. No email required.

Get your free Training Audit

1,000+ teams trained worldwide

Average 40% improvement in learning outcomes

Results in 5 minutes — no strings attached