What it does
VoiceMode adds 2-way voice conversations to Claude Code. Developers speak naturally rather than type; the server transcribes speech to text via Whisper (local or cloud), sends the transcript to Claude, and converts responses back to audio via Kokoro (local) or OpenAI TTS (cloud). It features low-latency streaming, smart silence detection to know when you've finished speaking, and works entirely offline if local speech services are configured.
Who it's for
Engineers and developers using Claude Code in situations where typing isn't practical: pair-programming while cooking, debugging during walks between meetings, code reviews over coffee, or extended sessions where hands-free operation reduces eye strain and improves focus.
Common use cases
- Ask quick questions or get clarifications without breaking focus from physical tasks
- Pair-program or debug while multitasking (walking, cooking, holding a beverage)
- Reduce eye strain during extended coding sessions via voice-based interaction
- Review code changes and discuss architecture decisions in real time without a keyboard
- Work in environments (meetings, open office) where a keyboard is inconvenient
Setup pitfalls
- System dependencies differ by platform: Ubuntu/Debian need
ffmpeg,portaudio,libasound2, andpulseaudio; macOS requiresffmpegandportaudiovia Homebrew; WSL2 specifically requires pulseaudio packages for microphone access - One secret has been detected in the repository—review the codebase before use in sensitive environments
- Microphone and speaker permissions must be granted to the terminal or app on macOS and Linux
- OpenAI API key (set via
OPENAI_API_KEYenv var) is required for cloud-based STT/TTS but optional if local Whisper and Kokoro services are configured