What this system does
Audiobox from Meta is a research-grade AI platform for creating audio. It combines spoken inputs and plain-language text prompts to generate vocal performances and environmental sound effects, enabling users to craft bespoke audio assets for many different scenarios. The system is designed to broaden creative options for audio production and experimentation.
Main model components
- Audiobox Sound — designed specifically for producing non-speech audio like atmospheres, foley, and effects
- Audiobox Speech — focused on generating natural-sounding voices and spoken content
- Audiobox SSL — a self-supervised foundation model that underpins the specialist models
How generation works
Users can provide either voice examples or text prompts (or both) to guide the output. The foundation model interprets these inputs, then specialized submodels shape the final audio, whether it’s a spoken line with a particular timbre or a layered soundscape. The workflow supports iterative refinement so creators can adjust prompts and inputs until the result matches their intent.
Typical uses
- Rapid prototyping of voiceovers, character lines, or dialogue for games and films
- Creating layered background audio, sound effects, and ambiences for multimedia projects
- Producing custom audio assets for accessibility features, voice assistants, or educational content
Safety and responsible use
Meta emphasizes safe deployment by incorporating guardrails and usage policies that limit misuse. The platform includes controls to help prevent generation of harmful or deceptive audio, and documentation explains acceptable use practices, license terms, and moderation guidance.
Interactive demos and technical information
- Live demos allow users to test speech and sound generation directly in the browser
- Technical notes provide model architecture summaries, training setup, and evaluation metrics
Summary
Audiobox offers a flexible suite of models for both voice and non-voice audio creation, backed by a self-supervised core. With interactive examples, safety measures, and detailed technical documentation, it’s positioned as a practical toolkit for creators and researchers exploring generative audio.
Technical
- Web App
- Full