Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level token operations. This allows users to modify not only what is said (the text) but also how it's said: emotion, tone, speaking style, prosody, accent, even paralinguistic cues. Because the model is trained with a “large-margin learning” objective over many synthesized and natural speech samples, it gains robust control over expressive attributes, and can perform iterative editing: e.g. you could record a line, then ask the model to “make it sadder,” “speak slower,” or “change accent to X.”

Features

  • Token-based audio editing: converts speech to discrete token streams for high-level, language-like editing operations on audio
  • Dual-codebook tokenizer design: separates linguistic content and prosody/style — enabling control over both what is said and how it's said
  • Expressive editing: allows modifying emotion, tone, accent, speaking style, prosody, pacing, and other vocal attributes without re-recording
  • Iterative editing workflow: supports multiple rounds of edits — e.g. change style, then adjust emotion, then pace, etc.
  • Zero-shot TTS: generate speech directly from text + optional style/emotion instructions, in a controlled expressive voice
  • Open-source model & code under permissive license — enabling integration, customization, and use in research, creative workflows, or production

Project Samples

Project Activity

See All Activity >

Categories

AI Models

License

Apache License V2.0

Follow Step-Audio-EditX

Step-Audio-EditX Web Site

Other Useful Business Software
Forever Free Full-Stack Observability | Grafana Cloud Icon
Forever Free Full-Stack Observability | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
Create free account
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Step-Audio-EditX!

Additional Project Details

Operating Systems

Linux

Programming Language

Python

Related Categories

Python AI Models

Registered

2025-12-01