Showing 277 open source projects for "delphi speech recognition"

View related business solutions
  • Go From Idea to Deployed AI App Fast Icon
    Go From Idea to Deployed AI App Fast

    One platform to build, fine-tune, and deploy. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    Vosk Speech Recognition Toolkit

    Vosk Speech Recognition Toolkit

    Offline speech recognition API for Android, iOS, Raspberry Pi

    Speech recognition bindings are implemented for various programming languages like Python, Java, Node.JS, C#, C++, Rust, Go and others. Vosk supplies speech recognition for chatbots, smart home appliances, and virtual assistants. It can also create subtitles for movies, and transcription for lectures and interviews. Vosk scales from small devices like Raspberry Pi or Android smartphones to big clusters.
    Downloads: 97 This Week
    Last Update:
    See Project
  • 2
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. ...
    Downloads: 73 This Week
    Last Update:
    See Project
  • 3
    sherpa-onnx

    sherpa-onnx

    Speech-to-text, text-to-speech, and speaker recognition

    Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.
    Downloads: 85 This Week
    Last Update:
    See Project
  • 4
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details).
    Downloads: 10 This Week
    Last Update:
    See Project
  • $300 in Free Credit Across 150+ Cloud Services Icon
    $300 in Free Credit Across 150+ Cloud Services

    VMs, containers, AI, databases, storage | build anything. No commitment to start.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale with Google Cloud.
    Start Building Free
  • 5
    FireRedASR

    FireRedASR

    Open-source industrial-grade ASR models

    ...FireRedASR not only excels in traditional speech recognition tasks but also demonstrates strong capability in challenging scenarios like singing lyrics recognition, where accurate transcription is often difficult for conventional models.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Kaldi

    Kaldi

    kaldi-asr/kaldi is the official location of the Kaldi project

    Kaldi is an open source toolkit for speech recognition research. It provides a powerful framework for building state-of-the-art automatic speech recognition (ASR) systems, with support for deep neural networks, Gaussian mixture models, hidden Markov models, and other advanced techniques. The toolkit is widely used in both academia and industry due to its flexibility, extensibility, and strong community support.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    Kimi-Audio is an ambitious open-source audio foundation model designed to unify a wide array of audio processing tasks — from speech recognition and audio understanding to generative conversation and sound event classification — within a single cohesive architecture. Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one system, enabling developers to build rich, multimodal audio applications without stitching together disparate components. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    The SpeechBrain Toolkit

    The SpeechBrain Toolkit

    A PyTorch-based Speech Toolkit

    ...It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains. SpeechBrain supports state-of-the-art methods for end-to-end speech recognition, including models based on CTC, CTC+attention, transducers, transformers, and neural language models relying on recurrent neural networks and transformers. Speaker recognition is already deployed in a wide variety of realistic applications. SpeechBrain provides different models for speaker recognition, including X-vector, ECAPA-TDNN, PLDA, and contrastive learning. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Buster

    Buster

    Captcha solver extension for humans

    Save time by asking Buster to solve captchas for you. Buster is a Firefox extension which helps you to solve difficult captchas by completing reCAPTCHA audio challenges using speech recognition. Challenges are solved by clicking on the extension button at the bottom of the reCAPTCHA widget. It is not guaranteed that challenges are always solved, the limitations of the technology need to be considered. The continued development of Buster is made possible thanks to the support of awesome backers. If you'd like to join them, please consider contributing with Patreon, PayPal or Bitcoin. ...
    Downloads: 68 This Week
    Last Update:
    See Project
  • Push Code. Get a Production URL. Done. Icon
    Push Code. Get a Production URL. Done.

    Cloud Run deploys any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try Cloud Run Free
  • 10
    WhisperKit

    WhisperKit

    On-device Speech Recognition for Apple Silicon

    WhisperKit is a Swift package that integrates OpenAI's popular Whisper speech recognition model with Apple's CoreML framework for efficient, local inference on Apple devices. Whisper has pulled the future forward when fast, free and virtually error-free translation and transcription will be ubiquitous. It inspired numerous developers to improve and deploy it with minimal friction and maximum performance. We founded Argmax in November 2023 to empower developers and enterprises everywhere to deploy commercial-scale inference workloads on user devices. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Handy STT

    Handy STT

    A free, open source, and extensible speech-to-text application

    ...Its backend leverages OpenAI’s Whisper models for GPU-accelerated speech recognition and Parakeet V3 for efficient CPU-only transcription with automatic language detection. To further refine accuracy and responsiveness, Handy integrates Silero’s Voice Activity Detection (VAD) for silence filtering, ensuring only speech segments are processed.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 12
    whisper.cpp

    whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples. whisper.cpp supports integer quantization of the Whisper ggml models. ...
    Downloads: 350 This Week
    Last Update:
    See Project
  • 13
    Polyglot

    Polyglot

    Cross-platform AI language practice app

    Polyglot is a cross platform AI language practice application that runs as a desktop app and also offers a web version. It is built around conversational large language models and Azure based text to speech services, turning them into an interactive environment for speaking practice in multiple languages. Users can define custom AI personas, choose languages, and configure their own OpenAI and Azure keys so they retain control over which backends they use. The app supports speech recognition with quick keyboard shortcuts, allowing learners to hold down a key to speak and release it to submit for recognition and response. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    StreamSpeech

    StreamSpeech

    StreamSpeech is a seamless model for offline speech recognition

    StreamSpeech is an “all-in-one” speech model designed to perform offline and simultaneous speech recognition, speech translation, and speech synthesis within a single unified architecture. Developed as part of an ACL 2024 paper, it targets streaming and low-latency scenarios where intermediate results and final translations or synthetic speech must be produced continuously as audio is being received.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    whisper-timestamped

    whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence. Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    stt

    stt

    Voice Recognition to Text Tool

    stt is a standalone speech recognition tool that locally converts spoken content in audio or video files into textual formats without requiring internet access, giving users control over their data and reducing reliance on external APIs. It leverages open-source speech models such as Faster-Whisper to recognize and transcribe human speech into plain text, structured JSON objects, or subtitle files with time codes, making it suitable for both personal and professional transcription tasks. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 17
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI architectures are typically large and require a lot of data and compute for training. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    OpenVINO

    OpenVINO

    OpenVINO™ Toolkit repository

    OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks. Use models trained with popular frameworks like TensorFlow, PyTorch and more. Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud. This open-source version includes several components: namely Model Optimizer, OpenVINO™ Runtime, Post-Training Optimization Tool, as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 19
    Underthesea

    Underthesea

    Underthesea - Vietnamese NLP Toolkit

    Underthesea is a Vietnamese NLP toolkit providing various text processing capabilities, including word segmentation, part-of-speech tagging, and named entity recognition.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    BrowserAI

    BrowserAI

    Run local LLMs like llama, deepseek, kokoro etc. inside your browser

    ...The platform provides a developer-friendly SDK with pre-configured popular models, and it allows for seamless switching between MLC and Transformer engines. Additionally, it supports features such as speech recognition, text-to-speech, structured output generation, and Web Worker support for non-blocking UI performance.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    ...Code & examples provided with Hugging Face transformers, and usage via AutoProcessor, model classes etc. High performance on many standard benchmarks: ASR, speech-emotion recognition, vocal sound classification, speech translation etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Pot Desktop

    Pot Desktop

    A cross-platform software for text translation and recognition

    Pot-Desktop is a cross-platform productivity tool aimed at helping users quickly translate, perform OCR (optical character recognition), and synthesize speech for selected text or images — all with minimal friction. It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    TTS Voice Wizard

    TTS Voice Wizard

    Speech to Text to Speech, sends text as OSC messages

    Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) Use TTS Voice Wizard's accessibility features to improve your VRChat experience (it works outside of VRChat too!) You can convert your Speech-to-Text and back to Speech through various Speech Recognition and Text-to-Speech methods.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    ...Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    ...Very strong benchmark performance across modalities (audio understanding, speech recognition, image/video reasoning) and often outperforming or matching single-modality models at a similar scale. Real-time streaming responses, including natural speech synthesis (text-to-speech) and chunked inputs for low latency interaction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB