• Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    FireRedTTS-2

    FireRedTTS-2

    Long-form streaming TTS system for multi-speaker dialogue generation

    FireRedTTS2 is a next-generation open-source text-to-speech (TTS) system focused on long-form, streaming speech synthesis for multi-speaker dialogue, delivering stable natural speech with context-aware prosody and reliable speaker transitions that support real-time and conversational applications. It features a specialized streaming speech tokenizer and a dual-transformer architecture that enables low latency and high-quality synthesis, making it suitable for interactive systems like...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    MARS5

    MARS5

    MARS5 speech model (TTS) from CAMB.AI

    ...To control speaker identity, MARS5 uses a short reference audio clip, typically between 2 and 12 seconds, from which it learns the voice characteristics. It supports two main inference modes: shallow clone, which is faster and only needs the reference audio, and deep clone, which additionally uses the transcript of the reference audio to increase similarity and naturalness at the cost of more computation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 4
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. The system introduces a multi-reward reinforcement learning framework that jointly optimizes for voice similarity, emotional expressiveness, pronunciation, and intelligibility, yielding output that can rival commercial options in naturalness and expressiveness. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    fairseq2

    fairseq2

    FAIR Sequence Modeling Toolkit 2

    ...It supports multi-GPU and multi-node distributed training using DDP, FSDP, and tensor parallelism, capable of scaling up to 70B+ parameter models. The framework integrates seamlessly with PyTorch 2.x features such as torch.compile, Fully Sharded Data Parallel (FSDP), and modern configuration management.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Dia2

    Dia2

    TTS model capable of streaming conversational audio in realtime

    ...The model supports audio conditioning, allowing generated speech to follow a reference voice or conversational style more naturally. Dia2 provides 1B and 2B model checkpoints along with inference code for research and experimentation. It currently focuses on English generation and supports up to two minutes of generated audio. Its main value is enabling low-latency, dialogue-oriented TTS workflows where timing, turn-taking, and natural conversation matter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    StyleTTS 2

    StyleTTS 2

    Towards Human-Level Text-to-Speech through Style Diffusion

    ...It extends the original StyleTTS idea by introducing a style diffusion model that can sample rich, realistic speaking styles conditioned on reference speech, allowing highly expressive and diverse prosody. The architecture uses a two-stage training process and leverages an auxiliary speech language model to guide generation toward more natural and coherent utterances. StyleTTS2 supports both single-speaker and multi-speaker configurations, with the ability to sample or transfer styles from reference audio, making it powerful for expressive TTS and character voices. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    TITTSE

    TITTSE

    Two Integrated Text To Speech Engines uses MMS & Silero

    TITTSE is a Python Application that allows you to easily and quickly convert text to speech in 15 different languages (or add more easily) using Two TTS Engines. All you need is a text file ending in the tittse extension with 4 header lines including the TITTSE language code (see documentation for your language), the 'base' file name for the audio files TITTSE creates, voice gender (girl or boy), offset (file numbers added to base file name start at this number). After those first four lines, every paragraph is created as a single audio file. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 9
    VITS

    VITS

    Conditional Variational Autoencoder with Adversarial Learning

    VITS is a foundational research implementation of “VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech,” a well-known neural TTS architecture. Unlike traditional two-stage systems that separately train an acoustic model and a vocoder, VITS trains an end-to-end model that maps text directly to waveform using a conditional variational autoencoder combined with normalizing flows and adversarial training. This architecture enables parallel generation (fast inference) while achieving speech quality that rivals or surpasses many two-stage systems. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    Transformer TTS

    Transformer TTS

    Implementation of a Transformer based neural network

    TransformerTTS is an implementation of a non-autoregressive Transformer-based neural network for text-to-speech, built with TensorFlow 2. It takes inspiration from architectures like FastSpeech, FastSpeech 2, FastPitch, and Transformer TTS, and extends them with its own aligner and forward models. The system separates alignment learning and acoustic modeling: an autoregressive Transformer is used as an aligner to extract phoneme-to-frame durations, while a non-autoregressive “ForwardTransformer” generates mel-spectrograms conditioned on text and durations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Resemblyzer

    Resemblyzer

    A python package to analyze and compare voices with deep learning

    ...The project is useful for researchers and developers who need a practical way to reason about speaker identity without building a voice encoder from scratch. It can help identify whether two recordings sound like the same speaker or visualize voice relationships across many samples. Its main value is making speaker representation accessible through a simple Python workflow.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    DC-TTS

    DC-TTS

    TensorFlow Implementation of DC-TTS: yet another text-to-speech model

    ...It follows the “Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention” paper, but the author adapts and extends the design to make it practical for real experiments. The model is split into two networks: Text2Mel, which maps text to mel-spectrograms, and SSRN (spectrogram super-resolution network), which converts low-resolution mel-spectrograms into high-resolution magnitude spectrograms suitable for waveform synthesis. Training scripts, data loaders, and hyperparameter configurations are provided to reproduce results on several datasets, including LJ Speech for English, a Korean single-speaker dataset, and audiobook data from Nick Offerman and Kate Winslet.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo