Voice Terminal turns small devices into a talking AI assistant. Type or speak, get low-latency streamed audio responses with real-time subtitles. Fully self-hostable with offline LLM and TTS support.
Don't want to self-host? Just flash and go. No servers, no Docker, no config — we handle everything.
Type a prompt → LLM processes → streamed TTS audio plays back instantly. Status indicators (READY / THINKING / SPEAKING) and live subtitles on screen.
Push-to-talk voice capture with Whisper STT. Automatic language detection for English and Russian. Audio sent to server for processing.
Run your own backend with Docker. Ollama for LLM, Piper or Edge-TTS for speech. No cloud required — full offline operation supported.
English and Russian voices out of the box. Automatic Cyrillic transliteration for LCD display. Configurable voice selection per language.
One-press WAV recording to microSD (coming soon). Up to 2-minute captures for notes and ideas.
Play WAV/MP3 from microSD (coming soon). Minimal UI, fast controls.
v0.1 — Talk mode with keyboard input, Wi-Fi provisioning via captive portal, real-time audio streaming. Reference device: M5Stack Cardputer Adv (ESP32-S3).