AI Voice Agents

The Workforce Wave AI Workflow Pattern: From Business URL to Production Voice Agent in 90 Seconds

Workforce Wave

April 17, 202611 min read
#ai-agents#architecture#scout#workflow-patterns

The pitch for Workforce Wave is simple: give us a business URL and we'll provision a production-ready voice agent. The implementation is less simple. Under the hood, "90 seconds" covers a web crawl, parallel entity extraction, LLM-based prompt generation, knowledge base document creation, and voice agent configuration — all as a typed pipeline where each step hands off structured data to the next.

This post is about the pipeline architecture: why we built it the way we did, how the steps compose, and what the async operation handle looks like from the client's perspective.

The Pipeline

Workforce Wave runs these steps in order:

business_url
  → [Crawl] scrape multiple pages in parallel
  → [Classify] categorize page content by type (home, services, about, reviews)
  → [Extract] run entity extraction on classified content in parallel
  → [Generate] build system prompt + knowledge base documents from entity_data
  → [Configure] create agent record + attach KB documents + provision phone number
  → { agentId, systemPrompt, firstMessage, kbDocuments[], entityData }

Each step produces a typed output that the next step consumes. If a step fails, we capture what completed and return partial results — the pipeline is designed to degrade gracefully rather than fail completely.

The TypeScript Pipeline Types

// lib/scout/types.ts

/** Raw content from a crawled page */
export interface CrawledPage {
  url: string;
  title: string;
  textContent: string;      // cleaned, de-tagged text
  contentType: PageContentType;
  crawledAt: string;
}

export type PageContentType =
  | "home"
  | "services"
  | "about"
  | "contact"
  | "reviews"
  | "faq"
  | "other";

/**
 * Structured entity data extracted from the business website.
 * This is the core output of the extraction step —
 * everything downstream uses this.
 */
export interface EntityData {
  businessName: string;
  primaryService: string;        // e.g. "dental practice", "HVAC services"
  location: {
    city: string;
    state: string;
    address?: string;
    serviceArea?: string;
  };
  phone?: string;
  hours?: BusinessHours;
  services: ServiceItem[];
  staff?: StaffMember[];
  uniqueValueProps: string[];    // what makes this business distinct
  reviewHighlights?: string[];   // key themes from customer reviews
  policies?: string[];           // cancellation, payment, insurance policies
}

/**
 * The complete Scout pipeline result.
 * All fields are optional — if a step failed, its output may be absent.
 * Consumers should handle partial results.
 */
export interface ScoutPipelineResult {
  operationId: string;
  entityData?: EntityData;
  systemPrompt?: string;
  firstMessage?: string;
  kbDocuments?: KBDocument[];
  agentId?: string;
  completedSteps: ScoutStep[];
  failedStep?: ScoutStep;
  failureReason?: string;
  durationMs: number;
}

export type ScoutStep =
  | "crawl"
  | "classify"
  | "extract"
  | "generate"
  | "configure";

Why Not a Monolith

The temptation was to write Workforce Wave as a single async function: crawl, extract, generate, configure, done. We prototyped it that way.

The problem appeared immediately in failure handling. If the LLM call in the "generate" step fails, we've already done the expensive crawl and entity extraction. In the monolith, all of that work is lost — the outer try/catch catches the failure and the caller gets nothing.

More importantly, the intermediate outputs are actually valuable in isolation. entityData is useful to callers even if prompt generation fails — they can use it to write a system prompt manually, or trigger a regeneration with different parameters. Losing it on a generation failure wastes both the crawl cost and the extracted data.

The pipeline pattern solves this. Each step:

  1. Receives the output of the previous step
  2. Returns its own typed output (or throws with a failedStep marker)
  3. Has its output preserved before the next step runs
// lib/scout/pipeline.ts

/**
 * Run the Scout pipeline for a given business URL.
 * Each step is independent — partial results are preserved and returned
 * even if a later step fails.
 */
export async function runScoutPipeline(
  businessUrl: string,
  operationId: string
): Promise<ScoutPipelineResult> {
  const startedAt = Date.now();
  const result: ScoutPipelineResult = {
    operationId,
    completedSteps: [],
    durationMs: 0,
  };

  try {
    // Step 1: Crawl — scrape multiple pages in parallel
    const pages = await crawlBusiness(businessUrl);
    result.completedSteps.push("crawl");

    // Step 2: Classify page content types
    const classified = classifyPages(pages);
    result.completedSteps.push("classify");

    // Step 3: Extract entity data — run in parallel per content type
    // Services pages and about pages are extracted concurrently
    const entityData = await extractEntities(classified);
    result.entityData = entityData;  // preserve before next step
    result.completedSteps.push("extract");

    // Step 4: Generate system prompt and KB documents
    // If this fails, entityData is already preserved above
    const generated = await generateAgentContent(entityData);
    result.systemPrompt = generated.systemPrompt;
    result.firstMessage = generated.firstMessage;
    result.kbDocuments = generated.kbDocuments;
    result.completedSteps.push("generate");

    // Step 5: Configure the agent in the database
    const agentId = await configureAgent(operationId, generated, entityData);
    result.agentId = agentId;
    result.completedSteps.push("configure");

  } catch (err) {
    // Capture which step failed — result still contains all completed step outputs
    result.failedStep = getFailedStep(err);
    result.failureReason = err instanceof Error ? err.message : String(err);
  }

  result.durationMs = Date.now() - startedAt;
  return result;
}

The LLM Call Pattern: Structured Output

Workforce Wave's entity extraction and prompt generation both use LLM structured output — we request a JSON Schema response format and get typed data back, not free text to parse.

// lib/scout/extract-entities.ts

import OpenAI from "openai";

const openai = new OpenAI();

/**
 * Extract structured entity data from classified page content.
 * Uses JSON Schema response_format to get typed data directly —
 * avoids regex-based parsing of free-form LLM output.
 */
export async function extractEntities(pages: ClassifiedPage[]): Promise<EntityData> {
  // Combine relevant page content into a single extraction prompt
  const relevantContent = pages
    .filter(p => ["home", "services", "about", "contact"].includes(p.contentType))
    .map(p => `[${p.contentType.toUpperCase()}]\n${p.textContent}`)
    .join("\n\n---\n\n")
    .slice(0, 12000); // Token budget management

  const response = await openai.chat.completions.create({
    model: "gpt-4.1",
    messages: [
      {
        role: "system",
        content: "Extract structured business information from the provided website content. Return only the JSON object matching the schema — no explanation.",
      },
      {
        role: "user",
        content: relevantContent,
      },
    ],
    // Structured output: the response will always match this schema
    response_format: {
      type: "json_schema",
      json_schema: {
        name: "EntityData",
        strict: true,
        schema: ENTITY_DATA_JSON_SCHEMA, // matches the EntityData TypeScript type
      },
    },
    temperature: 0.1, // Low temperature for extraction tasks — we want consistency
  });

  const content = response.choices[0].message.content;
  if (!content) throw new Error("Empty response from entity extraction LLM");

  // JSON.parse is safe here because response_format: json_schema guarantees validity
  return JSON.parse(content) as EntityData;
}

Using responseformat: jsonschema with strict: true means we never write code to parse or validate the LLM output shape. The model either returns valid JSON matching the schema, or the API throws. This eliminates a whole class of subtle extraction bugs.

The Three KB Documents

Workforce Wave generates three knowledge base documents for every agent:

Primary KB — The core operational document. Services offered, hours, location, staff bios, pricing tiers (if extractable). This is what the agent references for most customer questions.

FAQ — Generated from two sources: common questions we infer from the services list ("Do you accept insurance?", "How long does a cleaning take?") and questions extracted from review text. Reviews often contain implicit FAQs — "They fixed my HVAC same day" implies "Do you offer same-day service?"

Compliance — Policies, disclaimers, and legal language. Cancellation policies, payment terms, accessibility information. We keep this separate so it's easy to update without touching the operational content, and so the agent can prioritize it appropriately (compliance text should be recited verbatim, not paraphrased).

Template Variables in System Prompts

The generated system prompt uses template variables that get injected from entityData at agent configuration time:

You are a voice receptionist for {{business_name}}, a {{primary_service}} 
located in {{location.city}}, {{location.state}}. Your role is to answer 
questions, schedule appointments, and provide information about our services.

{{business_name}} specializes in: {{services_list}}

Business hours: {{hours_summary}}

Always be professional, warm, and concise. For complex questions you cannot 
answer, offer to take a message for the team.

The template approach means the same prompt structure works across all business types. The variables get resolved once at agent creation time — the resolved prompt is what gets stored and sent to ElevenLabs. No runtime variable substitution during calls.

The Async Operation Handle

Workforce Wave takes 60-120 seconds to run. That's too long for a synchronous HTTP response — most clients timeout at 30 seconds, and holding a connection open for 90 seconds is wasteful.

POST /v2/agents with a business_url returns immediately with 202 Accepted:

{
  "data": {
    "operationId": "op_a1b2c3d4",
    "status": "pending",
    "statusUrl": "/v2/operations/op_a1b2c3d4",
    "estimatedDurationSeconds": 90
  },
  "meta": {
    "request_id": "req_xyz",
    "timestamp_utc": "2026-05-05T14:00:00.000Z"
  }
}

The client has two options to wait for completion: poll the statusUrl, or listen for the agent.provisioned webhook.

# Poll pattern
curl -X POST https://api.workforcewave.com/v2/agents \
  -H "Authorization: Bearer $WFW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"business_url": "https://ridgelinedental.com"}'

# Response: 202 with operationId: "op_a1b2c3d4"

# Poll until complete
curl https://api.workforcewave.com/v2/operations/op_a1b2c3d4 \
  -H "Authorization: Bearer $WFW_API_KEY"

# Response when done:
# {
#   "data": {
#     "operationId": "op_a1b2c3d4",
#     "status": "completed",
#     "result": { "agentId": "agt_xyz789", "entityData": { ... } },
#     "completedAt": "2026-05-05T14:01:23.000Z"
#   }
# }

Workforce Wave's pipeline failures are surfaced as operation failures with failedStep and failureReason in the result, so clients can handle partial success (entity data extracted but prompt generation failed) rather than treating everything as a binary pass/fail.

The 90-second number in the pitch is a p50. P95 is around 3 minutes for a complex multi-service business with many pages to crawl. The async handle design means that variance doesn't matter — clients poll or webhook either way.

Share this article

Ready to put AI voice agents to work in your business?

Get a Live Demo — It's Free