Originally created by: haiphucnguyen
Enable Askimo to handle voice-based interactions in addition to text. This includes:
Voice Input (Speech-to-Text): Allow users to provide prompts by speaking into their microphone or supplying an audio file.
Voice Output (Text-to-Speech): Read AI responses aloud using a TTS engine.
This will make Askimo more accessible and useful in hands-free or accessibility-focused workflows.
Actions:
Integrate a speech-to-text engine (e.g., OpenAI Whisper, Vosk, or other provider APIs).
Extend CLI to accept audio file inputs (e.g., askimo --audio input.wav), also streaming (? - I don't know how to do it yet )
Add optional real-time microphone capture for interactive sessions.
Integrate a text-to-speech engine (e.g., OpenAI TTS, ElevenLabs, Amazon Polly).
Provide CLI flags for enabling voice output (e.g., askimo --speak).
Document supported audio formats and example commands in README.md.