Your AI agent looked great at launch. But quality degrades silently — hallucination rates creep up, tone shifts, accuracy drops. Agent SPC uses statistical process control to detect drift 4–8 days before it becomes a user problem.
SPC has been used in manufacturing for 90 years to catch process drift before defects reach customers. We bring the same rigour to AI agent outputs.
Add our lightweight SDK (Python, Node, REST) or pipe your existing evaluation scores. Works with any LLM framework — LangChain, AutoGen, custom chains.
Agent SPC samples your agent's outputs and computes your control limits — upper, center, and lower — across accuracy, tone, hallucination rate, and more.
Receive Slack/email alerts the moment Western Electric or Nelson rules trigger — run violations, trends, or single points beyond 3σ. Before users notice anything.
Each metric gets its own X-bar chart, with your agent's historical baseline as the center line and 3σ control limits computed from your own production data.
Track the proportion of responses containing unverifiable or fabricated claims. Detect creep from 0.8% to 2.1% before it becomes a support incident.
FACTUALITYMeasure drift from your target tone profile (professional, empathetic, concise). Catch when your support agent starts sounding curt or overly informal.
STYLEScore responses against your golden test set. SPC control charts make it easy to see when accuracy is trending — even before a single point leaves limits.
CORRECTNESSDetect when verbosity creeps up (or down). Overly terse responses often correlate with hallucination spikes; overly verbose ones with context window pressure.
VERBOSITYResponse time is a quality signal. Latency creep can indicate upstream model load, token bloat, or retrieval index degradation — SPC catches the trend early.
PERFORMANCEFor research and coding agents — track what fraction of claims are grounded in retrieved context. A dropping grounding rate is an early hallucination warning.
GROUNDINGStatistical Process Control was invented by Walter Shewhart at Bell Labs in 1924. For 100 years, it's been the standard way to detect process drift in manufacturing, aviation, and pharmaceuticals.
The insight: you don't need to see a defect to know the process is drifting. Statistical signals — runs, trends, and points near control limits — appear days or weeks before actual failures. The same logic applies perfectly to LLM outputs.
Read: SPC for AI — the full explainer2 out of 3 consecutive points in Zone A (beyond 2σ from center) — signals a shift before any breach.
6 or more consecutive points steadily increasing or decreasing — detects gradual accuracy degradation weeks early.
Any single point beyond 3σ limits — the classic signal for an acute quality event requiring immediate investigation.
At-a-glance status across all monitored metrics. Green = in control. Amber = run rule active. Red = control limit breached.
No credit card required to start. Connect one agent and see your quality baseline immediately.
A comprehensive guide to understanding why your AI agent's quality degrades over time — and what statistical signals appear before users ever notice.
Read articleThe "honeymoon period" is real. Here's why AI agent quality peaks at launch and what causes the inevitable — but preventable — decline.
Read articleHow Shewhart's 1924 control charts apply perfectly to LLM output quality monitoring — a technical deep dive.
Read articleConnect your agent today. See your first control chart in under 10 minutes. No credit card required.
Connect your agent. See your quality baseline today.