Showing 180 open source projects for "nepali speech model"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. ...
    Downloads: 104 This Week
    Last Update:
    See Project
  • 2
    Open Model Zoo

    Open Model Zoo

    Pre-trained Deep Learning models and demos

    Open Model Zoo is a large repository of high-quality pre-trained deep learning models and demonstration applications designed to work with the OpenVINO™ toolkit, offering a comprehensive starting point for a wide range of AI and computer vision workloads. It includes hundreds of models covering object detection, classification, segmentation, pose estimation, speech recognition, text-to-speech, and more, many of which are already converted into formats optimized for inference on CPUs, GPUs, VPUs, and other accelerators supported by OpenVINO. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Speech-AI-Forge

    Speech-AI-Forge

    Speech-AI-Forge is a project developed around TTS generation model

    ...The system is designed to be deployed in several ways: you can try it online via hosted demos, spin it up in a one-click Colab environment, run it in Docker containers, or set it up locally with its environment preparation scripts. It is model-agnostic and advertises support for a variety of TTS and speech models such as ChatTTS, CosyVoice, Fish-Speech, FireredTTS and others, as well as Whisper-based ASR, giving you a flexible playground for experimenting with different speech stacks. The project also integrates with general-purpose LLMs (for example GPT- or LLaMA-style models), which can be used to pre-process text, manage conversations.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    ...It uses a novel model setup that combines continuous acoustic features with discrete semantic tokens to richly capture sound and meaning across speech, music, and environmental audio.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    whisper.cpp

    whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    whisper.cpp is a lightweight, C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition (ASR) model—designed for efficient, standalone transcription without external dependencies. The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples. whisper.cpp supports integer quantization of the Whisper ggml models. ...
    Downloads: 451 This Week
    Last Update:
    See Project
  • 6
    StreamSpeech

    StreamSpeech

    StreamSpeech is a seamless model for offline speech recognition

    StreamSpeech is an “all-in-one” speech model designed to perform offline and simultaneous speech recognition, speech translation, and speech synthesis within a single unified architecture. Developed as part of an ACL 2024 paper, it targets streaming and low-latency scenarios where intermediate results and final translations or synthetic speech must be produced continuously as audio is being received.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    VibeVoice

    VibeVoice

    Open-source multi-speaker long-form text-to-speech model

    VibeVoice-1.5B is Microsoft’s frontier open-source text-to-speech (TTS) model designed for generating expressive, long-form, multi-speaker conversational audio such as podcasts. Unlike traditional TTS systems, it excels in scalability, speaker consistency, and natural turn-taking for up to 90 minutes of continuous speech with as many as four distinct speakers. A key innovation is its use of continuous acoustic and semantic speech tokenizers operating at an ultra-low frame rate of 7.5 Hz, enabling high audio fidelity with efficient processing of long sequences. ...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 8
    MetaVoice-1B

    MetaVoice-1B

    Foundational model for human-like, expressive TTS

    MetaVoice — in the form of its source repository “metavoice-src” — is a large-scale text-to-speech (TTS) model. Specifically, the base model (MetaVoice-1B) uses around 1.2 billion parameters and has been trained on a massive dataset — reportedly around 100,000 hours of speech data. The goal is to provide human-like, expressive, and flexible TTS: able to generate natural-sounding speech that can handle diverse inputs and likely generalize over voice styles, intonation, prosody, and perhaps multiple languages or accents. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    SenseVoice

    SenseVoice

    Multilingual speech recognition and audio understanding model

    SenseVoice is a speech foundation model designed to perform multiple voice understanding tasks from audio input. It provides capabilities such as automatic speech recognition, spoken language identification, speech emotion recognition, and audio event detection within a single system. SenseVoice is trained on more than 400,000 hours of speech data and supports over 50 languages for multilingual recognition tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    Faster Whisper

    Faster Whisper

    Faster Whisper transcription with CTranslate2

    Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 11
    MARS5

    MARS5

    MARS5 speech model (TTS) from CAMB.AI

    MARS5-TTS is CAMB.AI’s open-source English speech model designed for high-quality text-to-speech and voice emulation. It uses a two-stage architecture that combines an autoregressive (AR) model with a non-autoregressive (NAR) model, giving it both expressiveness and speed. The model is built to handle prosodically challenging content such as sports commentary, anime dialogue, and other high-energy or highly varied speech patterns with realistic rhythm and intonation. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Voicebox

    Voicebox

    The open-source voice synthesis studio powered by Qwen3-TTS

    Voicebox is a local-first voice synthesis studio that aims to bring professional, DAW-like voice generation workflows to a desktop app while keeping models and voice data entirely on your machine. It positions itself as an open-source alternative to cloud voice platforms by emphasizing privacy, offline use, and freedom from subscriptions or usage caps. The tool supports downloading voice models, cloning voices from short audio samples, and generating speech locally, then organizing the...
    Downloads: 92 This Week
    Last Update:
    See Project
  • 13
    GLM-4-Voice

    GLM-4-Voice

    GLM-4-Voice | End-to-End Chinese-English Conversational Model

    GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Sopro TTS

    Sopro TTS

    A lightweight text-to-speech model with zero-shot voice cloning

    Sopro TTS is an open-source text-to-speech (TTS) project that implements a lightweight model capable of producing speech from text with zero-shot voice cloning, meaning it can mimic a speaker’s voice from only a few seconds of reference audio. Built with a 169 million-parameter architecture that uses dilated convolutions and cross-attention layers instead of large Transformer stacks, it achieves relatively fast real-time performance even on CPUs (about a 0.25 real-time factor measured on an M3 base). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    kokoro-onnx

    kokoro-onnx

    TTS with kokoro and onnx runtime

    kokoro-onnx is a text-to-speech toolkit that wraps the Kokoro neural TTS model in an easy-to-use ONNX Runtime interface, so you can generate speech from Python with minimal setup. It focuses on running efficiently on commodity hardware, including macOS with Apple Silicon, while still delivering near real-time performance for many use cases. The project ships prebuilt model files and a simple example script, so you can go from installation to producing an audio.wav file in just a few steps. ...
    Downloads: 100 This Week
    Last Update:
    See Project
  • 16
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 17
    LuxTTS

    LuxTTS

    A high-quality rapid TTS voice cloning model

    ...The project supports zero-shot voice cloning, meaning it can adapt to a reference speaker’s voice with minimal example data, enabling realistic and personalized synthetic speech. Intended for developers, hobbyists, and creators, the repository includes installation instructions, usage examples, and Python APIs that make it feasible to integrate the model in local workflows, web demos, or production systems. Its design emphasizes efficiency and practicality, fitting within modest GPU memory footprints.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    ChatTTS

    ChatTTS

    A generative speech model for daily dialogue

    ChatTTS is an open-source conversational text-to-speech model optimized for dialogue, developed by 2Noise. Trained on 100,000+ hours of English and Chinese conversation data, it excels at generating expressive prosody—pauses, interjections, laughter—for more natural-sounding speech synthesis in assistant and chatbot applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    ...It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. Code & examples provided with Hugging Face transformers, and usage via AutoProcessor, model classes etc. High performance on many standard benchmarks: ASR, speech-emotion recognition, vocal sound classification, speech translation etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    CosyVoice

    CosyVoice

    Multi-lingual large voice generation model, providing inference

    CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech across languages and in code-switching contexts. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    MLX-Audio

    MLX-Audio

    A text-to-speech, speech-to-text and speech-to-speech library

    MLX-Audio is a speech library built on Apple’s MLX framework and optimized for Apple Silicon machines (M-series Macs). It focuses on text-to-speech and speech-to-speech workflows, with APIs and a command-line interface that make it easy to generate high-quality audio from text. Because it uses MLX and targets Apple Silicon, inference is fast and can take advantage of hardware acceleration and quantization for efficient on-device performance. The project provides a straightforward CLI...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 22
    Moonshine Voice

    Moonshine Voice

    Fast and accurate automatic speech recognition (ASR) for edge devices

    moonshine is an open-source automatic speech recognition toolkit optimized for fast and accurate transcription on edge devices and local environments. The project is designed to enable real-time voice applications such as live transcription, voice commands, and embedded speech interfaces without requiring heavy cloud infrastructure. Its architecture emphasizes low latency and flexible input handling, allowing audio streams of varying durations rather than relying on fixed processing windows. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    VoxCPM2

    VoxCPM2

    Tokenizer-Free TTS for Multilingual Speech Generation

    VoxCPM2 is an advanced open-source text-to-speech system that redefines speech synthesis by eliminating traditional tokenization and instead generating continuous speech representations through a diffusion-based autoregressive architecture. Built on top of the MiniCPM model family, it enables highly natural, expressive, and context-aware speech generation that adapts tone, emotion, and pacing directly from input text.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 24
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    PersonaPlex is an open-source real-time conversational speech AI model that goes beyond traditional text chat by providing full-duplex speech-to-speech interaction, meaning it can listen and talk at the same time instead of waiting for you to finish speaking before responding. This architectural approach eliminates awkward pauses and makes conversations feel much more human-like, with natural behaviors such as overlapping speech, interruptions, and fluent turn-taking, traits that traditional AI assistants typically lack. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB