Designing Idempotent Mutations for AI Agents: The Complete Guide

Here's a failure mode we saw in early testing before we shipped idempotency keys: an AI agent makes a POST /v2/agents request to provision a voice agent. The request times out after 30 seconds — the network dropped the response, but the server actually processed it successfully. The agent, not knowing what happened, retries. Now the client has two agents provisioned, two phone numbers purchased from Twilio, and two lines of billing. The retry logic did exactly what it was supposed to do. The missing idempotency key is what broke everything.

This isn't a hypothetical. It happened in our staging environment. It's happening in production systems everywhere that serve AI agents without idempotency support.

Why Agents Retry

Human users retry occasionally — they click "submit" twice when a page hangs. AI agents retry systematically, and they have to. The options available to an agent when a mutation request times out are:

Retry the request
Abort and surface an error to the user
Enter an unknown state and stall

Option 2 creates a bad user experience. Option 3 is worse. Option 1 is the right call — if and only if the retry is safe. Idempotency makes it safe.

The timing failure mode is the most common. But there's also the case where an agent gets back a 500 error: was the mutation processed before the server crashed, or not? Without idempotency, the agent can't know. With it, it can safely retry and get the stored response if the operation succeeded, or try again if it didn't.

The Pattern

The idempotency key pattern is simple:

The client generates a UUID v4 for each operation and sends it as an Idempotency-Key header.
The server checks whether this key has been seen before. If yes: return the stored response. If no: process the request, store the response keyed by ${clientId}:${key}, then return it.
Keys expire after 24 hours.

That's it. The implementation is in middleware — individual route handlers don't need to know about it.

// middleware/idempotency.ts

import { Redis } from "@upstash/redis";
import { NextRequest, NextResponse } from "next/server";
import type { ActorContext } from "@/lib/auth/context";

const redis = Redis.fromEnv();

// How long we store idempotency records (24 hours)
// Long enough to catch agent retries across network partitions.
// Short enough that we're not accumulating stale records forever.
const IDEMPOTENCY_TTL_SECONDS = 60 * 60 * 24;

interface IdempotencyRecord {
  statusCode: number;
  headers: Record<string, string>;
  body: unknown;
  createdAt: string;
}

export async function withIdempotency(
  req: NextRequest,
  actor: ActorContext,
  handler: () => Promise<NextResponse>
): Promise<NextResponse> {
  const idempotencyKey = req.headers.get("Idempotency-Key");

  // Only apply to mutating methods — GET/HEAD are inherently idempotent
  const isMutation = ["POST", "PUT", "DELETE", "PATCH"].includes(req.method);
  if (!idempotencyKey || !isMutation) {
    return handler();
  }

  // Scope keys per client to prevent cross-tenant collisions.
  // Client A's key "abc-123" must not match Client B's key "abc-123".
  const storageKey = `idempotency:${actor.clientId}:${idempotencyKey}`;

  // Check for an existing response
  const existing = await redis.get<IdempotencyRecord>(storageKey);
  if (existing) {
    // Return the stored response — skip all processing
    const response = NextResponse.json(existing.body, {
      status: existing.statusCode,
    });
    // Signal to the client that this was a replay
    response.headers.set("Idempotency-Replayed", "true");
    // Re-apply any stored response headers (e.g., Location for 201s)
    for (const [key, value] of Object.entries(existing.headers)) {
      response.headers.set(key, value);
    }
    return response;
  }

  // Process the request
  const response = await handler();

  // Store the response for future replays.
  // Only store on success (2xx) — don't store 4xx/5xx, let those be retried fresh.
  // Exception: store 422 validation errors to prevent reprocessing clearly bad inputs.
  const shouldStore =
    response.status < 300 || response.status === 422;

  if (shouldStore) {
    const responseBody = await response.clone().json();

    // Capture headers we want to replay (avoid storing hop-by-hop headers)
    const replayHeaders: Record<string, string> = {};
    const headersToCapture = ["Location", "Content-Type", "X-Request-Id"];
    for (const header of headersToCapture) {
      const value = response.headers.get(header);
      if (value) replayHeaders[header] = value;
    }

    const record: IdempotencyRecord = {
      statusCode: response.status,
      headers: replayHeaders,
      body: responseBody,
      createdAt: new Date().toISOString(),
    };

    await redis.setex(storageKey, IDEMPOTENCY_TTL_SECONDS, JSON.stringify(record));
  }

  return response;
}

The Redis Schema

We keep it simple. The key is idempotency:{clientId}:{userKey} and the value is the serialized IdempotencyRecord. Redis TTL handles expiration — no background jobs needed.

KEY:   idempotency:client_abc123:550e8400-e29b-41d4-a716-446655440000
TYPE:  string (JSON)
TTL:   86400 seconds (24 hours)
VALUE: {
  "statusCode": 201,
  "headers": { "Location": "/v2/agents/agt_xyz789" },
  "body": { "data": { "id": "agt_xyz789", "status": "active", ... }, "meta": { ... } },
  "createdAt": "2026-03-03T14:22:00.000Z"
}

We chose 24 hours as the TTL for a specific reason: most agent retry loops run for minutes, not hours. If a request has been retried for 24 hours without resolution, something is wrong that idempotency can't fix. The client should be surfacing that failure to a human, not replaying the key.

What Operations Need Idempotency

All mutations. POST, PUT, DELETE, PATCH. Every one. The rules:

POST /v2/agents — creates an agent, purchases a phone number. Must be idempotent.
DELETE /v2/agents/:id — releases the phone number, stops billing. Must be idempotent (a second delete on an already-deleted agent should return the original 200, not a 404).
POST /v2/kb/documents — creates a knowledge base document. Must be idempotent.
PUT /v2/agents/:id/config — updates configuration. Must be idempotent — the second call with the same key and same body should return the same result.

GET requests never need idempotency. They're already safe to retry by definition.

The Subtle Case: Partial Failures

Here's where it gets interesting. What happens when the first request processes half-way through and then fails?

Scenario: POST /v2/agents with a business_url. Workforce Wave begins processing — it makes an LLM call, spends money, generates a system prompt. Then the PostgreSQL write fails. The agent isn't created, but resources were consumed. The operation failed.

Our middleware doesn't store 500 responses. So the client retries with the same idempotency key and gets a fresh handler run — which redoes the provisioning.

That's the correct behavior for most cases. But for operations where partial processing is expensive or has external side effects (like a Twilio number purchase), we use a two-phase pattern inside the handler itself:

// Inside the createAgent handler — two-phase commit pattern
async function createAgentHandler(req: NextRequest, actor: ActorContext) {
  const idempotencyKey = req.headers.get("Idempotency-Key")!;
  const reservationKey = `reservation:${actor.clientId}:${idempotencyKey}`;

  // Phase 1: Check for a partial completion record.
  // This handles the case where we purchased a phone number but the DB write failed.
  const reservation = await redis.get<{ phoneNumber: string; scoutResult: ScoutResult }>(
    reservationKey
  );

  let phoneNumber: string;
  let scoutResult: ScoutResult;

  if (reservation) {
    // Resume from partial state — don't redo the expensive external calls
    ({ phoneNumber, scoutResult } = reservation);
  } else {
    // Phase 1: do the external work
    [phoneNumber, scoutResult] = await Promise.all([
      twilioClient.purchaseNumber(actor.clientId),
      scoutAI.process(body.business_url),
    ]);

    // Store the reservation before the DB write.
    // If the DB write fails below, we can resume from here on retry.
    await redis.setex(
      reservationKey,
      3600, // 1-hour window to complete phase 2
      JSON.stringify({ phoneNumber, scoutResult })
    );
  }

  // Phase 2: DB write (idempotent by nature — same data, same result)
  const agent = await db.insert(agents).values({
    clientId: actor.clientId,
    phoneNumber,
    systemPrompt: scoutResult.systemPrompt,
    // ...
  }).onConflictDoNothing().returning();

  // Clear the reservation on success
  await redis.del(reservationKey);

  return NextResponse.json({ data: agent[0] }, { status: 201 });
}

This two-phase approach is only necessary for operations with expensive, non-reversible external effects. Most CRUD operations don't need it — a clean retry after a DB failure is fine.

How Clients Should Generate Keys

The key only needs to be unique per operation intent. If an agent is trying to create a specific agent for a specific client, it should derive or generate a key for that intent and keep it stable across retries.

// Good: stable UUID per intent, generated once and reused across retries
const idempotencyKey = crypto.randomUUID(); // v4, generated before the retry loop

async function createAgentWithRetry(config: AgentConfig): Promise<Agent> {
  // Generate the key once, outside the retry loop
  const key = crypto.randomUUID();

  for (let attempt = 0; attempt < 5; attempt++) {
    try {
      const response = await fetch("/api/v2/agents", {
        method: "POST",
        headers: {
          "Authorization": `Bearer ${token}`,
          "Idempotency-Key": key,  // same key on every retry
          "Content-Type": "application/json",
        },
        body: JSON.stringify(config),
      });
      return response.json();
    } catch (err) {
      if (attempt === 4) throw err;
      await sleep(exponentialBackoff(attempt));
    }
  }
}

The key must be the same on every retry of the same operation. A new UUID per attempt defeats the purpose entirely.

The Broader Principle

Idempotency keys are not optional for mutation APIs that serve AI agents. They're load-bearing infrastructure. Without them, every network failure is a potential duplicate operation, every retry is a gamble, and your billing reconciliation becomes a nightmare.

The implementation cost is low — a Redis middleware and a convention about Idempotency-Key headers. The reliability benefit is enormous. We've seen agents retry the same POST /v2/agents request up to 8 times during a network partition. All 8 returned the same response. One agent was created. That's what correct behavior looks like.