Showing 21 open source projects for "ace-step"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    ACE-Step 1.5

    ACE-Step 1.5

    The most powerful local music generation model

    ...Beyond straightforward text-to-music synthesis, ACE-Step 1.5 enables flexible creative workflows, including tasks like cover generation, editing existing tracks, transforming vocals to background accompaniment, and stylistic personalization using low-rank adaptation from just a few example songs.
    Downloads: 123 This Week
    Last Update:
    See Project
  • 2
    Step-Audio

    Step-Audio

    Open-source framework for intelligent speech interaction

    ...Through its architecture, Step-Audio supports multilingual interaction, dialects, emotional tones (joy, sadness, etc.), and even more creative speech styles (like rap or singing), while allowing dynamic control over speech characteristics. It also provides a “generative data engine,” which can produce synthetic speech data (cloning voices, varying style) to support TTS training.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Step-Audio-EditX

    Step-Audio-EditX

    LLM-based Reinforcement Learning audio edit model

    Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level token operations. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Step-Video-T2V

    Step-Video-T2V

    State-of-the-art (SoTA) text-to-video pre-trained model

    ...Its training and generation pipeline includes techniques like flow-matching, full 3D attention for temporal consistency, and fine-tuning approaches (e.g. video-based DPO) to improve fidelity and reduce artifacts. As a result, Step-Video-T2V aims to push the frontier of open-source video generation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    DeepSeek Math

    DeepSeek Math

    Pushing the Limits of Mathematical Reasoning in Open Language Models

    ...MATH, GSM8K, ARB), demonstration notebooks, prompt templates, and evaluation results on math benchmarks. The goal is to push DeepSeek’s performance in domains that require rigorous symbolic steps, calculus, linear algebra, number theory, or multi-step derivations. The repo may also include modules that integrate external computational tools (e.g. a CAS / computer algebra system) or calculator assistance backends to enhance correctness. Because math reasoning is a high bar for LLMs, DeepSeek-Math aims to showcase their model’s ability not just in natural text but in precise formal reasoning.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    SeedVR

    SeedVR

    Repo for SeedVR2 & SeedVR

    ...These models leverage advanced techniques such as adaptive attention mechanisms and adversarial training to produce visually appealing results in a single inference step, pushing the boundaries of video restoration research. SeedVR’s transformer-based design allows it to handle variable frame resolutions and lengths, and its architecture is optimized to overcome traditional limitations of windowed attention in high-resolution contexts.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    GLM-4.7

    GLM-4.7

    Advanced language and coding AI model

    GLM-4.7 is an advanced agent-oriented large language model designed as a high-performance coding and reasoning partner. It delivers significant gains over GLM-4.6 in multilingual agentic coding, terminal-based workflows, and real-world developer benchmarks such as SWE-bench and Terminal Bench 2.0. The model introduces stronger “thinking before acting” behavior, improving stability and accuracy in complex agent frameworks like Claude Code, Cline, and Roo Code. GLM-4.7 also advances “vibe...
    Downloads: 79 This Week
    Last Update:
    See Project
  • 9
    PokeeResearch-7B

    PokeeResearch-7B

    Pokee Deep Research Model Open Source Repo

    ...It is built to operate end-to-end: planning a research strategy, gathering sources, reasoning over conflicting claims, and writing a grounded response. The repository includes evaluation results on multi-step QA and research benchmarks, illustrating how web-time context boosts accuracy. Because the system is modular, you can swap the search component, reader, or policy to fit private deployments or different data domains. It’s aimed at developers who want a transparent, hackable research agent they can run locally or wire into existing workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    Tiktoken

    Tiktoken

    tiktoken is a fast BPE tokeniser for use with OpenAI's models

    tiktoken is a high-performance, tokenizer library (based on byte-pair encoding, BPE) designed for use with OpenAI’s models. It handles encoding and decoding text to token IDs efficiently, with minimal overhead. Because tokenization is a fundamental step in preparing text for models, tiktoken is optimized for speed, memory, and correctness in model contexts (e.g. matching OpenAI’s internal tokenization). The repo supports multiple encodings (e.g. “cl100k_base”) and lets users switch encoding names to match different model contexts. It also offers extension mechanisms so that custom encodings can be registered. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T N1.5 is the world's first open foundation model

    NVIDIA Isaac‑GR00T N1.5 is an open-source foundation model engineered for generalized humanoid robot reasoning and manipulation skills. It accepts multimodal inputs—such as language and images—and uses a diffusion transformer architecture built upon vision-language encoders, enabling adaptive robot behaviors across diverse environments. It is designed to be customizable via post-training with real or synthetic data. The vision-language model remains frozen during both pretraining and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and can output or act via tools seamlessly, bridging perception and execution. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    MiniMax-M1

    MiniMax-M1

    Open-weight, large-scale hybrid-attention reasoning model

    ...The team emphasizes efficient scaling of test-time compute: at 100K-token generation lengths, M1 reportedly uses only about 25 percent of the FLOPs of some competing models, making extended “think step” traces more feasible. M1 is further trained with large-scale reinforcement learning over diverse tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Warlock-Studio

    Warlock-Studio

    AI Suite for upscaling, interpolating & restoring images/videos

    v6.0. Warlock-Studio is a Windows application that uses Real-ESRGAN, BSRGAN, IRCNN, GFPGAN, RealESRNet, RealESRAnime and RIFE Artificial Intelligence models to upscale, restore faces, interpolate frames and reduce noise in images and videos. the application supports GPU acceleration (including multi-GPU setups) and offers batch processing for large workloads. It includes drag-and-drop handling for single or multiple files, optional pre-resize functions, and an automatic tiling system...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 15
    Style Aligned

    Style Aligned

    Official code for Style Aligned Image Generation via Shared Attention

    StyleAligned is a diffusion-model editing technique and codebase that preserves the visual “style” of an original image while applying new semantic edits driven by text. Instead of fully re-generating an image—and risking changes to lighting, texture, or rendering choices—the method aligns internal features across denoising steps so the target edit inherits the source style. This alignment acts like a constraint on the model’s evolution, steering composition, palette, and brushwork even as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Consistency Models

    Consistency Models

    Official repo for consistency models

    ...It builds on and extends diffusion model frameworks (e.g. based on the guided-diffusion codebase), adding techniques like consistency distillation and consistency training to enable fast, often one-step, sample generation. The repo is implemented in PyTorch and includes support for large-scale experiments on datasets like ImageNet-64 and LSUN variants. It also contains checkpointed models, evaluation scripts, and variants of sampling / editing algorithms described in the paper. Because consistency models reduce the number of inference steps, they are promising for real-time or low-latency generative systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    PRM800K

    PRM800K

    800,000 step-level correctness labels on LLM solutions to MATH problem

    PRM800K is a process supervision dataset accompanying the paper Let’s Verify Step by Step, providing 800,000 step-level correctness labels on model-generated solutions to problems from the MATH dataset. The repository releases the raw labels and the labeler instructions used in two project phases, enabling researchers to study how human raters graded intermediate reasoning. Data are stored as newline-delimited JSONL files tracked with Git LFS, where each line is a full solution sample that can contain many step-level labels and rich metadata such as labeler UUIDs, timestamps, generation identifiers, and quality-control flags. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    minGPT

    minGPT

    A minimal PyTorch re-implementation of the OpenAI GPT

    ...It strips away extraneous bells and whistles, aiming to show how a sequence of token indices is fed into a stack of transformer blocks and then decoded into the next token probabilities, with both training and inference supported. Because the whole model is around 300 lines of code, users can follow each step—from embedding lookup, positional encodings, multi-head attention, feed-forward layers, to output heads—and thus demystify how GPT-style models work beneath the surface. It provides a practical sandbox for experimentation, letting learners tweak the architecture, dataset, or training loop without being overwhelmed by framework abstraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    GPT Neo

    GPT Neo

    An implementation of model parallel GPT-2 and GPT-3-style models

    ...Training and inference is officially supported on TPU and should work on GPU as well. This repository will be (mostly) archived as we move focus to our GPU-specific repo, GPT-NeoX. NB, while neo can technically run a training step at 200B+ parameters, it is very inefficient at those scales. This, as well as the fact that many GPUs became available to us, among other things, prompted us to move development over to GPT-NeoX. All evaluations were done using our evaluation harness. Some results for GPT-2 and GPT-3 are inconsistent with the values reported in the respective papers. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Nemotron 3 Super

    Nemotron 3 Super

    Open language model developed by NVIDIA as part of Nemotron-3 family

    ...Its architecture combines Transformer attention layers with Mamba state-space components to balance long-context reasoning, memory efficiency, and high-quality language generation. The model is optimized for building AI agents that must perform complex tasks such as planning, tool usage, coding assistance, and multi-step reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    DeepSeek-V3.2

    DeepSeek-V3.2

    High-efficiency reasoning and agentic intelligence model

    ...The model was notably used in competitive AI challenges such as the 2025 International Mathematical Olympiad (IMO) and IOI, achieving top-tier results. DeepSeek-V3.2 also features a large-scale agentic task synthesis pipeline, which generates training data to enhance tool-use intelligence and multi-step reasoning. It introduces a new “thinking with tools” chat template, allowing it to reason and decide when to invoke specific tools during problem solving.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB