An open-source, voice-first AI terminal.

Voice Terminal turns small devices into a talking AI assistant. Type or speak, get low-latency streamed audio responses with real-time subtitles. Fully self-hostable with offline LLM and TTS support.

Coming Soon

Voice Terminal Cloud

Don't want to self-host? Just flash and go. No servers, no Docker, no config — we handle everything.

Talk Mode

Type a prompt → LLM processes → streamed TTS audio plays back instantly. Status indicators (READY / THINKING / SPEAKING) and live subtitles on screen.

Voice Input

Push-to-talk voice capture with Whisper STT. Automatic language detection for English and Russian. Audio sent to server for processing.

Self-host Ready

Run your own backend with Docker. Ollama for LLM, Piper or Edge-TTS for speech. No cloud required — full offline operation supported.

Multilingual

English and Russian voices out of the box. Automatic Cyrillic transliteration for LCD display. Configurable voice selection per language.

Recorder

One-press WAV recording to microSD (coming soon). Up to 2-minute captures for notes and ideas.

Player

Play WAV/MP3 from microSD (coming soon). Minimal UI, fast controls.

Current Status

v0.1 — Talk mode with keyboard input, Wi-Fi provisioning via captive portal, real-time audio streaming. Reference device: M5Stack Cardputer Adv (ESP32-S3).

ESP-IDF 5.xOllama / OpenAIPiper / Edge-TTSWhisper STTPCM16 @ 16kHzFastAPIDocker