Support Voice Input and Output

AI desktop app with local RAG, privacy-first, multi-model support

Brought to you by: hainguyen79

Status: open

Owner: nobody

Labels: enhancement (7) help wanted (6)

Updated: 2025-08-18

Created: 2025-08-18

Creator: Anonymous

Private: No

Originally created by: haiphucnguyen

Enable Askimo to handle voice-based interactions in addition to text. This includes:

Voice Input (Speech-to-Text): Allow users to provide prompts by speaking into their microphone or supplying an audio file.
Voice Output (Text-to-Speech): Read AI responses aloud using a TTS engine.

This will make Askimo more accessible and useful in hands-free or accessibility-focused workflows.

Actions:

Integrate a speech-to-text engine (e.g., OpenAI Whisper, Vosk, or other provider APIs).
Extend CLI to accept audio file inputs (e.g., askimo --audio input.wav), also streaming (? - I don't know how to do it yet )
Add optional real-time microphone capture for interactive sessions.
Integrate a text-to-speech engine (e.g., OpenAI TTS, ElevenLabs, Amazon Polly).
Provide CLI flags for enabling voice output (e.g., askimo --speak).
Document supported audio formats and example commands in README.md.