Rate Limiting and Idempotency: What Your Bot Needs to Know
When a human uses an API, they notice the error message and retry manually. When a bot uses an API, it retries automatically — and if it retries wrong, it either hammers the rate limiter or creates duplicate records that are hard to unwind.
Two patterns prevent both problems: rate limit handling and idempotency. This post explains how WFW implements each, why they matter specifically for AI consumers, and gives you a TypeScript client that handles both correctly.
Rate Limits
Every WFW API token has rate limits. The current limits are communicated on every response via three headers:
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1724265600
Limit is the number of requests allowed in the current window. Remaining is how many you have left. Reset is the Unix timestamp when the window resets and Remaining goes back to Limit.
When you exceed the limit, you get a 429 Too Many Requests with a JSON body that includes retryafterseconds:
{
"error": {
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded for scope agents:write",
"retry_after_seconds": 38
}
}
Scope-specific buckets matter. The limits are not uniform across all operations. agents:write (creating and configuring agents) has a tighter limit than agents:read or calls:read. The read operations that AI orchestrators call frequently — polling operation status, fetching transcripts — have generous limits. The write operations that have real downstream effects — provisioning new agents, initiating calls — are intentionally constrained.
A bot that polls GET /v2/operations/{id} aggressively will not hit its write budget. A bot that provisions agents in a tight loop will. Know which bucket your hot paths hit.
Idempotency
Idempotency solves a different problem: what happens when the network fails between your request and the server's response?
Without idempotency, the flow looks like this:
- Bot sends
POST /v2/callsto initiate a call - WFW receives it, starts provisioning the call, returns
202 Accepted - Network drops before the response reaches the bot
- Bot sees a timeout error and retries
- WFW receives a second request and starts a second call
- Two calls go out to the patient
With idempotency:
- Bot sends
POST /v2/callswithIdempotency-Key: call-patient-8472-appt-20260820 - WFW receives it, starts provisioning the call, stores the key → result mapping
- Network drops before the response reaches the bot
- Bot retries with the same
Idempotency-Key - WFW recognizes the key, returns the result of the first request
- One call goes out to the patient
How to generate idempotency keys. The key should encode the logical operation, not the retry attempt. Use a UUID or a deterministic string derived from what you're doing:
// Good — encodes the logical operation
const key = `call-patient-${patientId}-appt-${appointmentId}`;
// Good — UUID generated once per logical operation, stored, reused on retry
const key = `op-${uuidv4()}`; // generate once, persist to DB, reuse on retry
// Bad — generates a new UUID on every call including retries
const key = `op-${uuidv4()}`; // generated inside the retry loop
The idempotency window is 24 hours. A key used today can't collide with the same key used tomorrow. If you legitimately want to initiate a second call to the same patient for a different appointment, use a different key (include the appointment ID).
The Retry Strategy
Exponential backoff with jitter is the right retry strategy for both rate limit errors (429) and transient server errors (500, 502, 503).
The logic:
- On a 429: wait exactly
retryaftersecondsbefore retrying. Don't guess — use the value the server gives you. - On a 5xx: start at 1 second, double each retry, add random jitter to avoid thundering herd
- Max 5 retries before giving up and logging the failure
- Never retry 4xx errors except 429 — a
400 Bad Requestwon't succeed on retry
A Production-Ready TypeScript Client
Here's a bot API client that handles rate limits, idempotency, and retries correctly:
import { v4 as uuidv4 } from 'uuid';
interface WfwClientOptions {
token: string;
baseUrl?: string;
maxRetries?: number;
}
interface RequestOptions {
idempotencyKey?: string; // required for POST/PATCH operations
}
class WfwApiClient {
private token: string;
private baseUrl: string;
private maxRetries: number;
constructor({ token, baseUrl = 'https://api.workforcewave.com', maxRetries = 5 }: WfwClientOptions) {
this.token = token;
this.baseUrl = baseUrl;
this.maxRetries = maxRetries;
}
async request<T>(
method: string,
path: string,
body?: unknown,
opts: RequestOptions = {}
): Promise<T> {
const headers: Record<string, string> = {
'Authorization': `Bearer ${this.token}`,
'Content-Type': 'application/json',
};
// Always include idempotency key for mutating operations
if (['POST', 'PATCH', 'PUT'].includes(method) && opts.idempotencyKey) {
headers['Idempotency-Key'] = opts.idempotencyKey;
}
let lastError: Error | null = null;
for (let attempt = 0; attempt <= this.maxRetries; attempt++) {
const res = await fetch(`${this.baseUrl}${path}`, {
method,
headers,
body: body ? JSON.stringify(body) : undefined,
});
// Rate limited — respect the server's retry_after_seconds
if (res.status === 429) {
const data = await res.json();
const waitMs = (data.error?.retry_after_seconds ?? 60) * 1000;
console.warn(`Rate limited. Waiting ${waitMs / 1000}s before retry ${attempt + 1}`);
await this.sleep(waitMs);
continue;
}
// Server error — exponential backoff with jitter
if (res.status >= 500 && attempt < this.maxRetries) {
const baseWait = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s, 8s, 16s
const jitter = Math.random() * 500;
console.warn(`Server error ${res.status}. Retrying in ${(baseWait + jitter) / 1000}s`);
await this.sleep(baseWait + jitter);
continue;
}
// Client error — don't retry (except 429 handled above)
if (!res.ok) {
const data = await res.json();
throw new Error(`WFW API error ${res.status}: ${data.error?.message ?? res.statusText}`);
}
return res.json() as Promise<T>;
}
throw lastError ?? new Error(`Request failed after ${this.maxRetries} retries`);
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
// Convenience methods with idempotency key generation
async initiateCall(patientId: string, agentId: string, phoneNumber: string) {
// Key encodes the logical operation — same patient + agent = same key
const idempotencyKey = `call-${patientId}-${agentId}`;
return this.request('POST', '/v2/calls', { agent_id: agentId, to: phoneNumber }, { idempotencyKey });
}
async provisionAgent(customerId: string, businessUrl: string) {
const idempotencyKey = `provision-customer-${customerId}`;
return this.request('POST', '/v2/agents', { business_url: businessUrl }, { idempotencyKey });
}
}
The two things to notice in this client:
First, rate limit retry uses retryafterseconds from the response body, not a hardcoded value. WFW's reset windows vary by scope, and guessing wrong means either waiting longer than necessary or retrying too soon and getting another 429.
Second, idempotency keys are generated from semantic identifiers (customerId, patientId) rather than random UUIDs created at request time. This means the key survives a process restart — if your bot crashes mid-batch and restarts, it generates the same keys for the same operations and WFW deduplicates them automatically.
Reading the Rate Limit Headers Proactively
You don't have to wait for a 429 to back off. A well-behaved client reads X-RateLimit-Remaining on every response and slows down when it gets low:
// After a successful response:
const remaining = parseInt(res.headers.get('X-RateLimit-Remaining') ?? '999');
const resetAt = parseInt(res.headers.get('X-RateLimit-Reset') ?? '0');
if (remaining < 10) {
const waitMs = Math.max(0, resetAt * 1000 - Date.now());
console.log(`Rate limit low (${remaining} remaining). Pausing ${waitMs / 1000}s until reset.`);
await sleep(waitMs);
}
This is especially useful in batch operations — provisioning 50 agents in a loop will drain the agents:write bucket partway through. Reading the headers lets you pause naturally rather than hitting the 429 wall.
Next in this series: Provisioning 100 Voice Agents in 10 Minutes: A Batch Guide — the complete operational guide for SaaS companies provisioning agents at scale.
Ready to put AI voice agents to work in your business?
Get a Live Demo — It's FreeContinue Reading
Related Articles
Workforce Wave AI: The Engine Behind Auto-Provisioning
What happens inside the 5-step Workforce Wave pipeline when a partner enters a business URL, why partners get an operationId instead of a 30-second wait, and how ww_operations powers the fleet dashboard progress bar.
The Bot Creation Matrix: Four Ways to Deploy AI, Now All Live on WFW
Dual-mode agent support just shipped, completing the Bot Creation Matrix. WFW is now the only platform where a bot can be the creator and the consumer — entirely human-free.
The Hotel AI Called the Restaurant AI: A Story About What's Coming
When a hotel concierge AI needs to book a table at the hotel restaurant, it calls the restaurant's phone number — and the restaurant has a WFW agent. What happens? This is the A2A story.