The Core Difference
Half-duplex: One party communicates at a time. Like a walkie-talkie — you press to talk, release to listen. Most voice AI works this way.
Full-duplex: Both parties communicate simultaneously. Like a phone call — both sides can speak and listen at the same time.
This distinction sounds simple. The implications are enormous.
How Half-Duplex Voice AI Works
Every mainstream voice AI system — Siri, Alexa, Google Assistant, GPT-4o Voice, Gemini Live — uses a sequential pipeline:
Step 1: Detect end of user speech (silence threshold)
Step 2: Run speech-to-text (ASR)
Step 3: Process through language model (LLM)
Step 4: Convert response to audio (TTS)
Step 5: Play audio to user
Step 6: Return to Step 1Each step is discrete. The system cannot proceed to the next step until the previous one completes.
Limitations of Half-Duplex
Silence threshold problem: The system must detect that you've stopped speaking before it responds. This requires a pause — usually 500-800ms of silence. Speak too quickly or pause mid-thought, and the system either jumps in too early or waits too long.
No interruption handling: If the AI is speaking and you want to interrupt, the system typically either ignores you entirely or crashes its current output. It has no mechanism to gracefully yield.
Noise sensitivity: In a noisy environment, silence thresholds misfire constantly. A TV in the background, a navigation app, a colleague talking nearby — all can trigger false responses.
Unnatural rhythm: The stop-start cadence of half-duplex conversation feels robotic. Users adapt by speaking in complete sentences and waiting — behavior they'd never use with a human.
How Full-Duplex Voice AI Works
Full-duplex systems process audio input and generate audio output simultaneously. There is no sequential pipeline — input and output are parallel processes managed by a unified model.
Continuous: Audio input stream → Model processes in real-time
Continuous: Model generates output → Audio output stream
Decision: Model determines when to speak, when to yield, when to continueThe key insight: the model doesn't wait for silence to start processing. It is always processing. The decision of when to respond is made continuously based on acoustic and semantic signals.
What Full-Duplex Enables
Natural interruption: When you interrupt, the model detects it immediately and yields — the same way a person would.
Thinking pause detection: The model can distinguish "the user paused to think" from "the user finished speaking" using semantic context — not just silence duration.
Noise immunity: Because the model understands conversational context, it can filter out irrelevant audio semantically, not just acoustically.
Overlapping speech: In natural conversation, speakers frequently overlap briefly. Full-duplex handles this; half-duplex breaks on it.
The Technical Challenge
Building full-duplex voice AI is significantly harder than half-duplex. The main challenges:
Acoustic Echo Cancellation
When the AI speaks through a speaker, that audio enters the microphone. A full-duplex system must cancel its own voice from the input signal in real-time, or it will hear itself and get confused. This requires sophisticated signal processing at the hardware and software level.
Unified Modeling
Half-duplex can use separate, specialized models for each pipeline stage — often achieving higher quality per stage. Full-duplex requires joint modeling of speech perception and generation, which is architecturally complex and computationally demanding.
Turn-Taking Intelligence
The hardest problem: deciding when to speak. Half-duplex uses simple rules (silence threshold). Full-duplex needs a model that understands conversational dynamics — when a pause is a breath vs. an invitation to respond.
Seeduplex solves this with joint acoustic-semantic inference: the model uses both the audio signal and the dialogue context to make turn-taking decisions.
Performance Comparison
| Metric | Half-Duplex | Full-Duplex (Seeduplex) |
|---|---|---|
| Response latency | 500-800ms | ~200ms |
| False interrupt rate | Baseline | -50% |
| Noise false triggers | High | Low (semantic filtering) |
| Interruption handling | Poor | Native |
| Natural conversation score | Lower | Higher |
| Implementation complexity | Lower | Higher |
When Half-Duplex Is Still Fine
Full-duplex isn't universally better for every application:
- **Simple command interfaces** (set a timer, play music) — half-duplex is sufficient
- **Dictation and transcription** — no conversation required
- **Narration and audio content** — one-way output, duplex irrelevant
- **High-noise industrial environments** — where acoustic echo cancellation is impractical
The Future
Half-duplex dominated voice AI for a decade because full-duplex was architecturally too difficult at scale. Seeduplex's April 2026 launch — deployed to hundreds of millions of Doubao users — marks the first time full-duplex has been proven viable in production.
Other major AI labs are working on similar capabilities. Within 12-18 months, full-duplex is likely to become the expected baseline for conversational voice AI.