Rate Limiting and Idempotency: What Your Bot Needs to Know

When a human uses an API, they notice the error message and retry manually. When a bot uses an API, it retries automatically — and if it retries wrong, it either hammers the rate limiter or creates duplicate records that are hard to unwind.

Two patterns prevent both problems: rate limit handling and idempotency. This post explains how WFW implements each, why they matter specifically for AI consumers, and gives you a TypeScript client that handles both correctly.

Rate Limits

Every WFW API token has rate limits. The current limits are communicated on every response via three headers:

X-RateLimit-Limit: 120
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1724265600

Limit is the number of requests allowed in the current window. Remaining is how many you have left. Reset is the Unix timestamp when the window resets and Remaining goes back to Limit.

When you exceed the limit, you get a 429 Too Many Requests with a JSON body that includes retryafterseconds:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded for scope agents:write",
    "retry_after_seconds": 38
  }
}

Scope-specific buckets matter. The limits are not uniform across all operations. agents:write (creating and configuring agents) has a tighter limit than agents:read or calls:read. The read operations that AI orchestrators call frequently — polling operation status, fetching transcripts — have generous limits. The write operations that have real downstream effects — provisioning new agents, initiating calls — are intentionally constrained.

A bot that polls GET /v2/operations/{id} aggressively will not hit its write budget. A bot that provisions agents in a tight loop will. Know which bucket your hot paths hit.

Idempotency

Idempotency solves a different problem: what happens when the network fails between your request and the server's response?

Without idempotency, the flow looks like this:

Bot sends POST /v2/calls to initiate a call
WFW receives it, starts provisioning the call, returns 202 Accepted
Network drops before the response reaches the bot
Bot sees a timeout error and retries
WFW receives a second request and starts a second call
Two calls go out to the patient

With idempotency:

Bot sends POST /v2/calls with Idempotency-Key: call-patient-8472-appt-20260820
WFW receives it, starts provisioning the call, stores the key → result mapping
Network drops before the response reaches the bot
Bot retries with the same Idempotency-Key
WFW recognizes the key, returns the result of the first request
One call goes out to the patient

How to generate idempotency keys. The key should encode the logical operation, not the retry attempt. Use a UUID or a deterministic string derived from what you're doing:

// Good — encodes the logical operation
const key = `call-patient-${patientId}-appt-${appointmentId}`;

// Good — UUID generated once per logical operation, stored, reused on retry
const key = `op-${uuidv4()}`; // generate once, persist to DB, reuse on retry

// Bad — generates a new UUID on every call including retries
const key = `op-${uuidv4()}`; // generated inside the retry loop

The idempotency window is 24 hours. A key used today can't collide with the same key used tomorrow. If you legitimately want to initiate a second call to the same patient for a different appointment, use a different key (include the appointment ID).

The Retry Strategy

Exponential backoff with jitter is the right retry strategy for both rate limit errors (429) and transient server errors (500, 502, 503).

The logic:

On a 429: wait exactly retryafterseconds before retrying. Don't guess — use the value the server gives you.
On a 5xx: start at 1 second, double each retry, add random jitter to avoid thundering herd
Max 5 retries before giving up and logging the failure
Never retry 4xx errors except 429 — a 400 Bad Request won't succeed on retry

A Production-Ready TypeScript Client

Here's a bot API client that handles rate limits, idempotency, and retries correctly:

import { v4 as uuidv4 } from 'uuid';

interface WfwClientOptions {
  token: string;
  baseUrl?: string;
  maxRetries?: number;
}

interface RequestOptions {
  idempotencyKey?: string; // required for POST/PATCH operations
}

class WfwApiClient {
  private token: string;
  private baseUrl: string;
  private maxRetries: number;

  constructor({ token, baseUrl = 'https://api.workforcewave.com', maxRetries = 5 }: WfwClientOptions) {
    this.token = token;
    this.baseUrl = baseUrl;
    this.maxRetries = maxRetries;
  }

  async request<T>(
    method: string,
    path: string,
    body?: unknown,
    opts: RequestOptions = {}
  ): Promise<T> {
    const headers: Record<string, string> = {
      'Authorization': `Bearer ${this.token}`,
      'Content-Type': 'application/json',
    };

    // Always include idempotency key for mutating operations
    if (['POST', 'PATCH', 'PUT'].includes(method) && opts.idempotencyKey) {
      headers['Idempotency-Key'] = opts.idempotencyKey;
    }

    let lastError: Error | null = null;

    for (let attempt = 0; attempt <= this.maxRetries; attempt++) {
      const res = await fetch(`${this.baseUrl}${path}`, {
        method,
        headers,
        body: body ? JSON.stringify(body) : undefined,
      });

      // Rate limited — respect the server's retry_after_seconds
      if (res.status === 429) {
        const data = await res.json();
        const waitMs = (data.error?.retry_after_seconds ?? 60) * 1000;
        console.warn(`Rate limited. Waiting ${waitMs / 1000}s before retry ${attempt + 1}`);
        await this.sleep(waitMs);
        continue;
      }

      // Server error — exponential backoff with jitter
      if (res.status >= 500 && attempt < this.maxRetries) {
        const baseWait = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s, 8s, 16s
        const jitter = Math.random() * 500;
        console.warn(`Server error ${res.status}. Retrying in ${(baseWait + jitter) / 1000}s`);
        await this.sleep(baseWait + jitter);
        continue;
      }

      // Client error — don't retry (except 429 handled above)
      if (!res.ok) {
        const data = await res.json();
        throw new Error(`WFW API error ${res.status}: ${data.error?.message ?? res.statusText}`);
      }

      return res.json() as Promise<T>;
    }

    throw lastError ?? new Error(`Request failed after ${this.maxRetries} retries`);
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  // Convenience methods with idempotency key generation
  async initiateCall(patientId: string, agentId: string, phoneNumber: string) {
    // Key encodes the logical operation — same patient + agent = same key
    const idempotencyKey = `call-${patientId}-${agentId}`;
    return this.request('POST', '/v2/calls', { agent_id: agentId, to: phoneNumber }, { idempotencyKey });
  }

  async provisionAgent(customerId: string, businessUrl: string) {
    const idempotencyKey = `provision-customer-${customerId}`;
    return this.request('POST', '/v2/agents', { business_url: businessUrl }, { idempotencyKey });
  }
}

The two things to notice in this client:

First, rate limit retry uses retryafterseconds from the response body, not a hardcoded value. WFW's reset windows vary by scope, and guessing wrong means either waiting longer than necessary or retrying too soon and getting another 429.

Second, idempotency keys are generated from semantic identifiers (customerId, patientId) rather than random UUIDs created at request time. This means the key survives a process restart — if your bot crashes mid-batch and restarts, it generates the same keys for the same operations and WFW deduplicates them automatically.

Reading the Rate Limit Headers Proactively

You don't have to wait for a 429 to back off. A well-behaved client reads X-RateLimit-Remaining on every response and slows down when it gets low:

// After a successful response:
const remaining = parseInt(res.headers.get('X-RateLimit-Remaining') ?? '999');
const resetAt = parseInt(res.headers.get('X-RateLimit-Reset') ?? '0');

if (remaining < 10) {
  const waitMs = Math.max(0, resetAt * 1000 - Date.now());
  console.log(`Rate limit low (${remaining} remaining). Pausing ${waitMs / 1000}s until reset.`);
  await sleep(waitMs);
}

This is especially useful in batch operations — provisioning 50 agents in a loop will drain the agents:write bucket partway through. Reading the headers lets you pause naturally rather than hitting the 429 wall.

Next in this series: Provisioning 100 Voice Agents in 10 Minutes: A Batch Guide — the complete operational guide for SaaS companies provisioning agents at scale.

Rate Limiting and Idempotency: What Your Bot Needs to Know

Rate Limits

Idempotency

The Retry Strategy

A Production-Ready TypeScript Client

Reading the Rate Limit Headers Proactively

Related Articles

Workforce Wave AI: The Engine Behind Auto-Provisioning

The Bot Creation Matrix: Four Ways to Deploy AI, Now All Live on WFW

The Hotel AI Called the Restaurant AI: A Story About What's Coming