What it does

VoiceMode adds 2-way voice conversations to Claude Code. Developers speak naturally rather than type; the server transcribes speech to text via Whisper (local or cloud), sends the transcript to Claude, and converts responses back to audio via Kokoro (local) or OpenAI TTS (cloud). It features low-latency streaming, smart silence detection to know when you've finished speaking, and works entirely offline if local speech services are configured.

Who it's for

Engineers and developers using Claude Code in situations where typing isn't practical: pair-programming while cooking, debugging during walks between meetings, code reviews over coffee, or extended sessions where hands-free operation reduces eye strain and improves focus.

Common use cases

Ask quick questions or get clarifications without breaking focus from physical tasks
Pair-program or debug while multitasking (walking, cooking, holding a beverage)
Reduce eye strain during extended coding sessions via voice-based interaction
Review code changes and discuss architecture decisions in real time without a keyboard
Work in environments (meetings, open office) where a keyboard is inconvenient

Setup pitfalls

System dependencies differ by platform: Ubuntu/Debian need ffmpeg, portaudio, libasound2, and pulseaudio; macOS requires ffmpeg and portaudio via Homebrew; WSL2 specifically requires pulseaudio packages for microphone access
One secret has been detected in the repository—review the codebase before use in sensitive environments
Microphone and speaker permissions must be granted to the terminal or app on macOS and Linux
OpenAI API key (set via OPENAI_API_KEY env var) is required for cloud-based STT/TTS but optional if local Whisper and Kokoro services are configured

Tool name

Description

Destructive?

list_whisper_versions

✓ no

list_kokoro_versions

✓ no

voice-mode

What it does

Who it's for

Common use cases

Setup pitfalls