Interaction Mode 5 — Dual-Mode

One number. Humans and AI both call it.

The same phone number serves human callers with TTS conversation and AI orchestrators with structured JSON — detected in under 500ms, no second infrastructure layer required.

Detection via SIP headers + first-utterance analysis + pre-negotiated tokens.

Human Caller

Hi, how can I help?
I can schedule that for you.
What date works best?
Perfect — see you Thursday.

AI Caller

{
  "status": "ok",
  "slot": "2026-04-18
    T10:00:00",
  "confirmed": true,
  "agent_id":
    "agt_7rx9..."
}

<500ms detection — caller type identified before the first response

What Breaks Without It

The naive solution is two of everything.

Serving both humans and machines without dual-mode means maintaining two separate systems — forever.

Problem

Two phone numbers, two agents, double the cost

The naive solution: one number for humans, one for AI callers. Now you manage two phone lines, two agent configs, two billing accounts. When either changes, both need updating.

Problem

Human callers get machine-mode responses

If your agent is in machine mode 100% of the time, a human calling at 3pm for a booking gets terse JSON-style responses with no warmth, no conversation — and hangs up.

Problem

AI callers get human-mode responses

If your agent stays in human mode, a LangChain agent calling for structured appointment data gets verbose conversational text it can't parse. The integration fails silently.

Detection Mechanism

Three layers. Under 500ms total.

Dual-mode detection is not a guess — it's a layered analysis that resolves before the agent delivers its first response.

1

SIP Header Inspection

~0ms

The SIP User-Agent header on the incoming call identifies most machine callers immediately. AI systems calling via API typically declare their user agent. This layer resolves ~80% of cases before the call even connects.

2

First-Utterance Pattern Analysis

~200ms

If SIP inspection is inconclusive, WFW analyzes the first utterance. Machine callers have characteristic speech patterns: precise syntax, immediate task declaration, absence of social greeting. The model classifies in real time.

3

Pre-Negotiated Token Matching

<100ms

API callers can include a pre-negotiated auth token in their first utterance, bypassing detection latency entirely. This is the fast path for registered AI systems that need guaranteed <100ms machine-mode activation.

Response Formats

Same agent. Different response. Same call.

One phone number serves both — each caller type gets exactly what they need.

Human Mode

Caller is a person

  • Conversational, natural TTS speech
  • Empathy and warmth in tone
  • Clarifying questions when ambiguous
  • Hold music and human transfer logic
  • DTMF fallback if voice quality drops
  • Full appointment booking conversation

Machine Mode

Caller is an AI system

  • Structured JSON — typed, parseable
  • No conversational filler or pleasantries
  • Machine-readable error codes with retry hints
  • Task result in single response object
  • Sub-500ms end-to-end session completion
  • Webhook delivery of extractions on call.end

Real Scenarios

When one number needs to serve both.

These are the architectures where dual-mode is not a nice-to-have — it's the only clean solution.

SaaS Booking Platform

Your platform has human customers calling to change bookings AND n8n workflows calling to confirm appointments. One WFW agent, one phone number. Human gets a warm conversation; the workflow gets JSON. Same agent config — you maintain nothing twice.

Healthcare Practice

Patients call the practice line. The EHR integration (an AI agent) also queries the same line to verify appointment data. Dual-mode means HIPAA-compliant handling for the human call and structured data for the EHR query — from one phone number.

Enterprise AI Pipeline

Your enterprise deployed both a customer-facing IVR replacement AND an internal orchestration layer that queries call data. Single number, single agent. The internal AI gets JSON; the external customer gets conversation. Zero routing configuration.

API Surface

Configure dual-mode via the API.

Dual-mode is enabled per agent via the dual_mode flag in POST /v2/agents. Pre-negotiated tokens for fast-path machine detection are registered via the token management endpoints.

The caller_type field is included in every call.completed webhook payload — so your analytics always know whether a given call came from a human or an AI system.

Full API reference →

Enable Dual-Mode on Agent Creation

POST /v2/agents
{
  "businessUrl": "https://yourpractice.com",
  "verticalType": "dental",
  "phone_number": "+18435551234",
  "dual_mode": {
    "enabled": true,
    "machine_response_format": "json",
    "detection_timeout_ms": 500
  }
}

// call.completed webhook includes:
{
  "call_id": "call_4mn1...",
  "caller_type": "human",  // or "machine"
  "duration_seconds": 142,
  "disposition": "booked"
}

Detection Benchmark

<500ms

Caller type identified before the agent delivers its first response.

In practice, the SIP header inspection layer resolves approximately 80% of calls before the audio stream begins — meaning most machine callers see near-zero latency on mode selection. The 500ms bound is a hard worst-case for the full three-layer analysis including first-utterance classification.

Get Started

Stop maintaining two phone systems.

One WFW agent, one phone number, one API surface. Humans get the conversation they expect. AI callers get the JSON they need. Zero routing configuration required.

Dual-mode enabled via a single flag on any WFW agent.