Agent SPC — Statistical Quality Control for AI Agents

How it works

Three steps to a quality baseline

SPC has been used in manufacturing for 90 years to catch process drift before defects reach customers. We bring the same rigour to AI agent outputs.

1

Connect your agent

Add our lightweight SDK (Python, Node, REST) or pipe your existing evaluation scores. Works with any LLM framework — LangChain, AutoGen, custom chains.

2

Establish your quality baseline

Agent SPC samples your agent's outputs and computes your control limits — upper, center, and lower — across accuracy, tone, hallucination rate, and more.

3

Get alerts when rules fire

Receive Slack/email alerts the moment Western Electric or Nelson rules trigger — run violations, trends, or single points beyond 3σ. Before users notice anything.

What we monitor

Six quality dimensions — all on control charts

Each metric gets its own X-bar chart, with your agent's historical baseline as the center line and 3σ control limits computed from your own production data.

Hallucination Rate

Track the proportion of responses containing unverifiable or fabricated claims. Detect creep from 0.8% to 2.1% before it becomes a support incident.

FACTUALITY

Tone Deviation

Measure drift from your target tone profile (professional, empathetic, concise). Catch when your support agent starts sounding curt or overly informal.

STYLE

Task Accuracy

Score responses against your golden test set. SPC control charts make it easy to see when accuracy is trending — even before a single point leaves limits.

CORRECTNESS

Response Length Drift

Detect when verbosity creeps up (or down). Overly terse responses often correlate with hallucination spikes; overly verbose ones with context window pressure.

VERBOSITY

Latency Trends

Response time is a quality signal. Latency creep can indicate upstream model load, token bloat, or retrieval index degradation — SPC catches the trend early.

PERFORMANCE

Citation & Grounding Rate

For research and coding agents — track what fraction of claims are grounded in retrieved context. A dropping grounding rate is an early hallucination warning.

GROUNDING

Why SPC?

The same method that keeps planes in the air — for your AI agents

Statistical Process Control was invented by Walter Shewhart at Bell Labs in 1924. For 100 years, it's been the standard way to detect process drift in manufacturing, aviation, and pharmaceuticals.

The insight: you don't need to see a defect to know the process is drifting. Statistical signals — runs, trends, and points near control limits — appear days or weeks before actual failures. The same logic applies perfectly to LLM outputs.

Read: SPC for AI — the full explainer

Western Electric Rule 2

2 out of 3 consecutive points in Zone A (beyond 2σ from center) — signals a shift before any breach.

Nelson Rule 3 (Run Rule)

6 or more consecutive points steadily increasing or decreasing — detects gradual accuracy degradation weeks early.

Shewhart Rule 1

Any single point beyond 3σ limits — the classic signal for an acute quality event requiring immediate investigation.

Live quality snapshot

What your quality dashboard looks like

At-a-glance status across all monitored metrics. Green = in control. Amber = run rule active. Red = control limit breached.

94.1%

Task Accuracy

In control

1.8%

Hallucination Rate

Run rule active

0.82

Tone Score

In control

312

Avg Response Tokens

+12% trend

1.4s

P50 Latency

In control

71%

Grounding Rate

Downward run

Pricing

Start free. Scale when you need to.

No credit card required to start. Connect one agent and see your quality baseline immediately.

Free

$0/mo

One agent, 30-day window. Perfect for evaluating Agent SPC on a production agent.

1 agent monitored
30-day quality window
All 6 quality metrics
Dashboard access
Slack/email alerts

Get started free

Quality thinking for AI teams

AI Quality

AI agent drift: what it is, why it happens, and how to detect it

A comprehensive guide to understanding why your AI agent's quality degrades over time — and what statistical signals appear before users ever notice.

May 2026 · 8 min read

Read article

Product

Why your AI agent performs worse two weeks after launch

The "honeymoon period" is real. Here's why AI agent quality peaks at launch and what causes the inevitable — but preventable — decline.

Apr 2026 · 6 min read

Read article

SPC Methods

Statistical process control for AI: adapting manufacturing quality methods for LLMs

How Shewhart's 1924 control charts apply perfectly to LLM output quality monitoring — a technical deep dive.

Apr 2026 · 10 min read

Read article

Catch AI agent quality drift
before users do.

Three steps to a quality baseline

Connect your agent

Establish your quality baseline

Get alerts when rules fire

Six quality dimensions — all on control charts

Hallucination Rate

Tone Deviation

Task Accuracy

Response Length Drift

Latency Trends

Citation & Grounding Rate

The same method that keeps planes in the air — for your AI agents

What your quality dashboard looks like

Start free. Scale when you need to.

Quality thinking for AI teams

AI agent drift: what it is, why it happens, and how to detect it

Why your AI agent performs worse two weeks after launch

Statistical process control for AI: adapting manufacturing quality methods for LLMs

Your quality baseline is waiting.

Catch AI agent quality driftbefore users do.

Three steps to a quality baseline

Connect your agent

Establish your quality baseline

Get alerts when rules fire

Six quality dimensions — all on control charts

Hallucination Rate

Tone Deviation

Task Accuracy

Response Length Drift

Latency Trends

Citation & Grounding Rate

The same method that keeps planes in the air — for your AI agents

What your quality dashboard looks like

Start free. Scale when you need to.

Quality thinking for AI teams

AI agent drift: what it is, why it happens, and how to detect it

Why your AI agent performs worse two weeks after launch

Statistical process control for AI: adapting manufacturing quality methods for LLMs

Your quality baseline is waiting.

Catch AI agent quality drift
before users do.