← Back
Phone calling agentHuman-in-the-loopAnti-prompt injectionCompliance at v1

CaulBot

Any phone call, delegated. Describe the task, approve the plan, let the AI handle it.

Research, then a plan

Two agents handle each call, and they never overlap. The Research Agent runs first. It parses your natural language request, searches for the right phone number if you didn't provide one, pulls relevant context from the web, and assembles a structured call plan: target, goal, scope, expected approach. You see all of this before anything is dialed. Approval is enforced at the execution layer, not as a UI courtesy. Plans expire after 15 minutes. If you don't confirm within that window, the call is dropped. The agent only acts on current intent, never on a stale plan.

Research, then a plan 1

An AI that makes phone calls for you. It plans first, you approve, then it executes.

Live call over WebSocket

Once approved, the Phone Agent takes over entirely. Twilio dials the number and opens a media stream over WebSocket; Deepgram transcribes the callee's speech in real time; the agent generates a response; Cartesia synthesizes it and sends it back through the stream. The full loop (speech in, decision, speech out) runs in under a second. The agent handles IVR menus, hold music, and natural conversation turns without intervention. If you want to follow along, a listen-in mode has Twilio call your own number so you can hear both sides live. If you need to steer mid-call, send a WhatsApp message. It gets synthesized and injected into the active conversation. When the call ends, the outcome summary lands in WhatsApp: what was said, what was agreed, what the agent learned.

Live call over WebSocket 1

Compliance by default

AI phone calling has real legal exposure. CaulBot ships with a five-pillar compliance engine that runs before any call is placed and cannot be disabled. Every call opens with a mandatory AI disclosure: the agent identifies itself as an AI calling on behalf of the user by name. Opt-out detection runs throughout the call. If the callee says anything resembling "don't call again" or "I'd prefer to speak to a human," the agent acknowledges it, ends the call, and adds the number to a permanent suppression list. Use-case restrictions block entire categories before dialing: marketing, sales, debt collection, political calls, and harassment are rejected at the request stage, not the call stage. Rate limits cap calls per number and per user to prevent abuse patterns. The goal was to ship compliance before the first user, not retrofit it after. The five pillars (disclosure, opt-out, recording clarity, abuse prevention, and recipient control) are enforced at the execution layer, not surfaced as settings.

Provider abstraction and open integration

Every component is swappable via config: LLM provider, speech-to-text, text-to-speech, telephony, and web search. The default stack is Groq + Deepgram + Cartesia + Twilio + Tavily, but any of those can be replaced without touching the agent logic. An HTTP REST API runs alongside the WhatsApp interface. POST a call request, poll for status, approve or reject via endpoint. The full flow is accessible programmatically. This makes CaulBot composable: wire it into a workflow, embed it as an MCP tool, or call it from any system that can make an HTTP request. The WhatsApp interface is one consumer of the same underlying API.

Highlights

Under the hood

Two-agent architectureWebSocket media streamsApproval gates5-pillar compliance engineProvider abstractionMCP-compatible REST API