AI-assisted support triage
Routing and drafts with humans in the loop
Client
Help desk on a mainstream SaaS tool, ~4k–6k tickets/month, team across AU and Philippines shifts.
How we engaged
We scoped with head of support; built integrations with their existing ticket API; trained leads on the rubric.
Stack & tooling
- Existing ticketing platform (API webhooks)
- Small service on AWS for orchestration (secrets in parameter store)
- Model API with structured JSON outputs
- Internal wiki for prompt versions and failure log review
Ticket volume had grown faster than headcount. We introduced structured classification and internal draft notes first—no customer-facing auto-send until quality held steady on a fixed test set.
Starting point
Tier-1 agents spent the first minutes of every ticket guessing category and which macro applied. Leads re-wrote the same internal summary for engineering handoffs. Leadership was getting pitched ‘AI chatbots’ by vendors; support wanted relief without risking compliance language in regulated topics.
Challenge
Highly variable ticket quality from customers. Some topics required legal-approved wording. Cost per ticket had to stay predictable at current volume.
Approach
Phase 1: classification + priority suggestion only, logged alongside human choice to measure agreement. Phase 2: internal draft notes for agents, never sent externally. Weekly review of failures; frozen prompt versions tied to release tags. Kill switch if confidence below threshold or keyword hit list.
Outcomes
- ✓Meaningful reduction in time-to-first-action for tier-1 on included categories
- ✓Clear escalation path when the model abstained—agents weren’t guessing
- ✓Documented cost envelope and latency p95 so finance and eng could plan capacity
Constraints & non-negotiables
Limits shape what ships now vs later — these were ours on this job.
- · No training on full ticket bodies with PII—used redaction pipeline agreed with legal
- · Customer-facing auto-replies explicitly out of scope for pilot
Phases
Order and emphasis change by client — this is how this one ran.
- 1
Weeks 1–2 · Policy & data boundaries
What the model may see, redaction rules, categories list, abstain triggers.
- 2
Weeks 3–5 · Classification pilot
Shadow mode logging, weekly error review, tune prompts against golden tickets.
- 3
Weeks 6–7 · Internal drafts
Agent-only suggestions, quality rubric, feedback loop with team leads.
- 4
Weeks 8–9 · Hardening
Rate limits, monitoring, runbook for vendor outage, sign-off for sustained use.
What actually worked
Refusing to auto-send to customers in v1 bought trust from the team. Structured JSON outputs made downstream routing reliable compared to free-text ‘AI said so’.
Real delivery patterns; names and details blended for confidentiality. Happy to walk through a comparable scope.