How We Auto-Generate Our MCP Server from an OpenAPI Spec

When we shipped the WFW MCP server at /api/v2/mcp, we made a decision that seemed obvious in retrospect but took us a few false starts to commit to: the OpenAPI spec is the single source of truth for everything. REST routes, MCP tool schemas, TypeScript types, validation — all generated from one file.

The alternative — maintaining a hand-crafted MCP server alongside the REST API — looked sustainable for maybe three months. After that, the tool descriptions would drift from the actual API behavior, the parameter schemas would fall out of sync, and the whentouse guidance we write for AI agents would become outdated fiction.

This post is about the pattern we landed on, the x-bot-guidance extension we invented, and the tradeoff between auto-generation speed and nuance.

The Problem: Two Surfaces, One Codebase

The WFW v2 API has around 20 meaningful endpoints. Each one exists in two contexts:

As a REST endpoint — documented for humans in our API reference, consumed by developers building integrations.
As an MCP tool — consumed by AI agents (Claude, GPT-4, etc.) that call it through the Model Context Protocol.

The human documentation and the MCP tool description are not the same thing. Human docs explain what a parameter does. MCP tool descriptions need to explain when to call this tool at all, what success looks like, and what to do if it fails. These are different audiences with different needs.

But the underlying data — the operation ID, the parameter names, the request/response schemas — is identical. If we maintain that data twice, we maintain it incorrectly.

The Approach: OpenAPI as Source of Truth

Our build pipeline runs a code generation step that reads openapi.yaml and emits:

Next.js route handler stubs (request/response types already imported)
Zod validation schemas for every request body and query parameter
TypeScript types for every request/response shape
MCP tool schema objects for every x-expose-as-mcp: true operation

The MCP generation step is the interesting one. Here's what an OpenAPI operation looks like in our spec, and what it becomes:

// Input: an OpenAPI operation object (parsed from YAML)
interface OpenAPIOperation {
  operationId: string;           // e.g. "createAgent"
  summary: string;               // e.g. "Provision a new voice agent"
  description?: string;          // human-readable long description
  parameters?: OpenAPIParameter[];
  requestBody?: OpenAPIRequestBody;
  responses: Record<string, OpenAPIResponse>;
  // our custom extension — see below
  "x-bot-guidance"?: BotGuidance;
}

// Output: an MCP tool schema
interface MCPToolSchema {
  name: string;                  // derived from operationId
  description: string;           // composed from multiple sources
  inputSchema: JSONSchema;       // derived from parameters + requestBody
}

// The transformation function
function operationToMCPTool(op: OpenAPIOperation): MCPToolSchema {
  const guidance = op["x-bot-guidance"];

  // Build description by layering human summary + bot-specific guidance
  const descriptionParts: string[] = [op.summary];

  if (guidance?.when_to_use) {
    descriptionParts.push(`\nWhen to use: ${guidance.when_to_use}`);
  }
  if (guidance?.success_looks_like) {
    descriptionParts.push(`\nSuccess: ${guidance.success_looks_like}`);
  }
  if (guidance?.if_error) {
    descriptionParts.push(`\nIf error: ${guidance.if_error}`);
  }

  return {
    name: camelToSnake(op.operationId),  // "createAgent" → "create_agent"
    description: descriptionParts.join(""),
    inputSchema: buildInputSchema(op.parameters, op.requestBody),
  };
}

The inputSchema builder merges path parameters, query parameters, and request body into a single flat JSON Schema object — which is what MCP expects. That merge is where most of the complexity lives, but the logic is mechanical.

The `x-bot-guidance` Extension

OpenAPI's extension mechanism lets you add x-* fields to any object in the spec. We use x-bot-guidance to attach AI-agent-specific metadata to operations without polluting the human documentation:

# In openapi.yaml
/v2/agents:
  post:
    operationId: createAgent
    summary: Provision a new voice agent
    description: |
      Creates a voice agent with the given configuration. Supports
      Workforce Wave auto-generation from a business URL.
    x-expose-as-mcp: true
    x-bot-guidance:
      when_to_use: |
        Call this when a client requests a new voice agent. If they provide
        a business_url, pass it through — Workforce Wave will auto-generate the
        system prompt, knowledge base, and entity data. If not, require
        system_prompt and voice_id at minimum.
      success_looks_like: |
        Returns agent object with status "active" or an operation handle
        with status "pending" if Workforce Wave provisioning is in progress.
        Poll GET /v2/operations/{operationId} or wait for agent.provisioned
        webhook before treating provisioning as complete.
      if_error: |
        429 → back off using retry_after_seconds in error response.
        422 with code "voice_not_found" → list available voices with
        GET /v2/voices and retry with a valid voice_id.
        500 → retryable: true means transient failure, retry after 10s.

The x-bot-guidance fields don't appear in the rendered human API docs — our doc generator ignores x-* extensions unless they're explicitly allow-listed. But they get included in the MCP tool description, which means the agent gets better guidance than the human developer does.

That's intentional. AI agents need more explicit behavioral direction than human developers. A human developer reads a 422 error, checks the error body, understands voicenotfound, and knows to look up valid voice IDs. An agent needs to be told the exact recovery path upfront — otherwise it'll infer something plausible but probably wrong.

What a Full MCP Tool Descriptor Looks Like

Here's the generated output for create_agent after the transformation:

{
  "name": "create_agent",
  "description": "Provision a new voice agent\n\nWhen to use: Call this when a client requests a new voice agent. If they provide a business_url, pass it through — Workforce Wave will auto-generate the system prompt, knowledge base, and entity data. If not, require system_prompt and voice_id at minimum.\n\nSuccess: Returns agent object with status \"active\" or an operation handle with status \"pending\" if Workforce Wave provisioning is in progress. Poll GET /v2/operations/{operationId} or wait for agent.provisioned webhook before treating provisioning as complete.\n\nIf error: 429 → back off using retry_after_seconds in error response. 422 with code \"voice_not_found\" → list available voices with GET /v2/voices and retry with a valid voice_id. 500 → retryable: true means transient failure, retry after 10s.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "business_url": {
        "type": "string",
        "format": "uri",
        "description": "Public URL of the business. Triggers Workforce Wave auto-generation."
      },
      "system_prompt": {
        "type": "string",
        "description": "System prompt for the agent. Required if business_url is not provided."
      },
      "voice_id": {
        "type": "string",
        "description": "ElevenLabs voice ID. Required if business_url is not provided."
      },
      "name": {
        "type": "string",
        "description": "Human-readable name for this agent."
      }
    },
    "required": []
  }
}

The description is longer than most developers would write by hand. That's fine. Agents process descriptions differently than humans — they're not skimming, they're parsing for behavioral cues. More explicit guidance means fewer wrong calls.

The Auto-Gen / Manual Override Balance

Here's the honest part: auto-generation doesn't work equally well for all endpoints.

For simple CRUD operations — list agents, get agent, update agent, delete agent — the generated tool descriptors are good enough to ship without touching them. The operation is obvious from the schema, and the x-bot-guidance annotation is short.

For complex multi-step operations — Workforce Wave provisioning, batch operations, webhook registration — the auto-generated output is a starting point, not a finished product. We maintain five tool descriptors in a mcp-overrides/ directory that completely replace the generated versions:

mcp-overrides/
  create_agent.json         # Scout provisioning — complex state machine
  batch_create_agents.json  # Async bulk operation with progress events
  update_kb_document.json   # Conflict resolution and version pinning
  register_webhook.json     # HMAC signing, event filter syntax
  run_compliance_check.json # Multi-rule evaluation, remediation paths

The build step checks this directory after generation and merges — manual files win over generated files for matching tool names. This gives us the best of both worlds: 15 tools that are always up-to-date because they're generated from the spec, and 5 tools that are carefully hand-crafted for the operations where nuance matters.

The discipline required is: when you change one of those 5 endpoints significantly, you must update the override. We put a CI check on it — if the OpenAPI schema for an overridden operation changes, the build requires a manual sign-off that the override is still accurate.

Why MCP Tool Descriptions Are Your Real API Docs

The insight that drove all of this: for AI-native applications, the MCP tool description is the primary documentation surface. Human API docs exist so developers can build integrations. But once those integrations exist and they're bot-driven, it's the MCP tool description that gets read on every single call.

If your MCP tool description says "Create an agent" and nothing else, you're hoping the agent figures out the right inputs from the schema alone. For simple operations it might. For operations with complex conditional requirements (like the businessurl OR {systemprompt + voice_id} requirement), it won't.

Think of whentouse, successlookslike, and if_error as the three questions a well-designed MCP tool always answers. What are the preconditions for calling me? What does my output look like when I succeed? What specific recovery steps should you take when I fail?

Your REST API documentation was written to answer "what does this endpoint do?" Your MCP tool descriptions need to answer "what should an agent do given this endpoint exists?" The two questions require different answers.

Auto-generation from OpenAPI gets you there faster. The x-bot-guidance extension gets you the rest of the way. And for your five most complex tools, neither will substitute for sitting down and writing the behavioral guidance by hand.

The full implementation — including the YAML schema for x-bot-guidance and the build script — is available in the WFW developer documentation.

How We Auto-Generate Our MCP Server from an OpenAPI Spec

The Problem: Two Surfaces, One Codebase

The Approach: OpenAPI as Source of Truth

The `x-bot-guidance` Extension

What a Full MCP Tool Descriptor Looks Like

The Auto-Gen / Manual Override Balance

Why MCP Tool Descriptions Are Your Real API Docs

Related Articles

Caller-Type Detection at 500ms: How to Tell a Human from an AI Mid-Call

How We Auto-Generate Our MCP Server from an OpenAPI Spec

The Problem: Two Surfaces, One Codebase

The Approach: OpenAPI as Source of Truth

The x-bot-guidance Extension

What a Full MCP Tool Descriptor Looks Like

The Auto-Gen / Manual Override Balance

Why MCP Tool Descriptions Are Your Real API Docs

Related Articles

Caller-Type Detection at 500ms: How to Tell a Human from an AI Mid-Call

The `x-bot-guidance` Extension