Showing 18 open source projects for "generative music composition"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    MuseGAN

    MuseGAN

    An AI for Music Generation

    MuseGAN is a deep learning research project designed to generate symbolic music using generative adversarial networks. The system focuses specifically on generating multi-track polyphonic music, meaning that it can simultaneously produce multiple instrument parts such as drums, bass, piano, guitar, and strings. Instead of generating raw audio, the model operates on piano-roll representations of music, which encode notes as time-pitch matrices for each instrument track. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    UNO

    UNO

    A Universal Customization Method for Single and Multi Conditioning

    UNO is a project by ByteDance introduced in 2025, titled “A Universal Customization Method for Both Single and Multi-Subject Conditioning.” It suggests a framework for image (or more general generative) modeling where the model can be conditioned either on a single subject or multiple subjects — which may correspond to generating or customizing images featuring specific people, styles, or objects, possibly with fine-grained control over subject identity or composition. Because the project is new (see activity logs for 2025), it seems to aim at bridging between single-subject customization and multi-subject generation in generative modeling — potentially useful for personalized content creation, flexible composition, or controlled generation tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    AudioLM - Pytorch

    AudioLM - Pytorch

    Implementation of AudioLM audio generation model in Pytorch

    Implementation of AudioLM, a Language Modeling Approach to Audio Generation out of Google Research, in Pytorch It also extends the work for conditioning with classifier free guidance with T5. This allows for one to do text-to-audio or TTS, not offered in the paper. Yes, this means VALL-E can be trained from this repository. It is essentially the same. This repository now also contains a MIT licensed version of SoundStream. It is also compatible with EnCodec, however, be aware that it...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    ACE-Step 1.5

    ACE-Step 1.5

    The most powerful local music generation model

    ACE-Step 1.5 is an advanced open-source foundation model for AI-driven music generation that pushes beyond traditional limitations in speed, musical coherence, and controllability by innovating in architecture and training design. It integrates cutting-edge generative techniques—such as diffusion-based synthesis combined with compressed autoencoders and lightweight transformer elements—to produce high-quality full-length music tracks with rapid inference times, capable of generating a complete song in seconds on modern GPUs while remaining efficient enough to run on consumer-grade hardware with minimal memory requirements. ...
    Downloads: 69 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    Wan2.2

    Wan2.2

    Wan2.2: Open and Advanced Large-Scale Video Generative Model

    Wan2.2 is a major upgrade to the Wan series of open and advanced large-scale video generative models, incorporating cutting-edge innovations to boost video generation quality and efficiency. It introduces a Mixture-of-Experts (MoE) architecture that splits the denoising process across specialized expert models, increasing total model capacity without raising computational costs. Wan2.2 integrates meticulously curated cinematic aesthetic data, enabling precise control over lighting, composition, color tone, and more, for high-quality, customizable video styles. ...
    Downloads: 109 This Week
    Last Update:
    See Project
  • 6
    YuE

    YuE

    Open source AI model for generating full songs from lyrics prompts

    YuE is an open source project that provides a foundation model designed for full-song music generation using artificial intelligence. It focuses on transforming text inputs such as lyrics and genre prompts into complete musical compositions that include both vocal and instrumental tracks. Unlike many shorter audio generators, the model is capable of producing songs that last several minutes while maintaining coherent musical structure and alignment with the provided lyrics. YuE introduces a...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Audiogen Codec

    Audiogen Codec

    48khz stereo neural audio codec for general audio

    ...These codecs, being low compression, outperform Meta's EnCodec and DAC on general audio as validated from internal blind ELO games. We trained (relatively) very low compression codecs in the pursuit of solving a core issue regarding general music and audio generation, low acoustic quality, and audible artifacts, which hinder industry use for these models. Our hope is to encourage researchers to build hierarchical generative audio models that can efficiently use high sequence length representations without sacrificing semantic abilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    ...It uses a novel model setup that combines continuous acoustic features with discrete semantic tokens to richly capture sound and meaning across speech, music, and environmental audio.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    MiniMax-MCP

    MiniMax-MCP

    Official MiniMax Model Context Protocol (MCP) server

    MiniMax-MCP is the official Model Context Protocol (MCP) server for accessing MiniMax’s multimodal generative APIs from MCP-compatible clients. It acts as a bridge between tools like Claude Desktop, Cursor, Windsurf, OpenAI Agents, and the MiniMax platform, exposing capabilities such as text-to-speech, voice cloning, image generation, text-to-image, video generation, image-to-video, text-to-video, and music generation. The server is written in Python and distributed under the MIT license, with a pyproject.toml and uv-based workflow that makes installation and execution reproducible. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    Amphion

    Amphion

    Toolkit for audio, music, and speech generation

    Amphion is a toolkit from OpenMMLab dedicated to audio, music, and speech generation, aimed at both reproducible research and helping newcomers get started in generative audio. It provides standardized implementations and recipes for classic and state-of-the-art generative models in audio, including TTS, music generation, and voice conversion. A distinctive feature of Amphion is its emphasis on visualization: it offers interactive visualizations of model architectures and generation processes, making it easier to understand how complex generative audio models work. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MusicLM - Pytorch

    MusicLM - Pytorch

    Implementation of MusicLM music generation model in Pytorch

    Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch. They are basically using text-conditioned AudioLM, but surprisingly with the embeddings from a text-audio contrastive learned model named MuLan. MuLan is what will be built out in this repository, with AudioLM modified from the other repository to support the music generation needs here.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Nougat

    Nougat

    Implementation of Nougat Neural Optical Understanding

    Nougat is a multi-modal generative modeling framework that bridges vision and text modalities with structured generation control (e.g. layout, scene composition) rather than treating images as flat contexts. It combines object-centric modules with transformer-based reasoning to propose, refine, and render scenes in a generative pipeline. The architecture allows you to specify or prompt a layout (which objects should be where) and then the model fills in appearance, context, lighting, and relations coherently. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Audio Webui

    Audio Webui

    A webui for different audio related Neural Networks

    Audio Webui is a Gradio-based web user interface that unifies a wide range of audio-related neural networks under a single, accessible front end. It is designed as an “all-in-one” environment where users can experiment with text-to-speech, voice cloning, generative music, and other neural audio models without writing boilerplate code. The project supports multiple back-end models and toolchains (such as Bark, RVC, AudioLDM, Audiocraft, and other text-to-audio or voice-cloning tools), exposing them through a consistent UI for inference and experimentation. Installation is streamlined through automatic installers and platform-specific scripts that create a virtual environment, install dependencies, and launch the web app with minimal manual setup. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    audio-diffusion-pytorch

    audio-diffusion-pytorch

    Audio generation using diffusion models, in PyTorch

    A fully featured audio diffusion library, for PyTorch. Includes models for unconditional audio generation, text-conditional audio generation, diffusion autoencoding, upsampling, and vocoding. The provided models are waveform-based, however, the U-Net (built using a-unet), DiffusionModel, diffusion method, and diffusion samplers are both generic to any dimension and highly customizable to work on other formats. Note: no pre-trained models are provided here, this library is meant for research...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    DeepMozart

    DeepMozart

    Audio generation using diffusion models

    Audio generation using diffusion models in PyTorch. The code is based on the audio-diffusion-pytorch repository.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Piano transcription

    Piano transcription

    Task of transcribing piano recordings into MIDI files

    Piano transcription is an open-source high-resolution piano transcription system by ByteDance that converts raw audio recordings of piano performance into symbolic MIDI files — detecting note onsets, offsets, pitch, velocity, and even pedal usage. The system is implemented in Python (PyTorch) and is capable of accurate transcription of polyphonic piano recordings, even with complex passages and pedal techniques, making it suitable for classical piano music. By using this transcription tool,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    GiantMIDI-Piano

    GiantMIDI-Piano

    Classical piano MIDI dataset

    GiantMIDI-Piano is a large-scale symbolic classical piano music dataset built by applying the piano_transcription system on a vast collection of piano performance recordings. The dataset contains thousands of piano works, spanning a large number of composers and styles, with each piece transcribed into high-precision MIDI files capturing note events, pedal usage, velocities, etc. It provides a resource for music information retrieval (MIR), symbolic music modeling, composer classification, music generation, analysis of classical piano repertoire, and data-driven research in musicology or AI-based composition.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    SG2Im

    SG2Im

    Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

    ...The pipeline typically predicts object layouts (bounding boxes and masks) from the graph, then renders a realistic image conditioned on those layouts. This separation lets the model reason about geometry and composition before committing to texture and color, improving spatial fidelity. The repository includes training code, datasets, and evaluation scripts so researchers can reproduce baselines and extend components such as the graph encoder or image generator. In practice, sg2im demonstrates how structured semantics can guide generative models to produce controllable, compositional imagery.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB