Dia-1.6B is a 1.6 billion parameter text-to-speech model by Nari Labs that generates high-fidelity dialogue directly from transcripts. Designed for realistic vocal performance, Dia supports expressive features like emotion, tone control, and non-verbal cues such as laughter, coughing, or sighs. The model accepts speaker conditioning through audio prompts, allowing limited voice cloning and speaker consistency across generations. It is optimized for English and built for real-time performance on enterprise GPUs, though CPU and quantized versions are planned. The format supports [S1]/[S2] tags to differentiate speakers and integrates easily into Python workflows. While not tuned to a specific voice, user-provided audio can guide output style. Licensed under Apache 2.0, Dia is intended for research and educational use, with explicit restrictions on misuse like identity mimicry or deceptive content.

Features

  • Realistic TTS from transcripts with speaker tagging ([S1]/[S2])
  • Emotion and tone control via conditioning audio
  • Supports non-verbal sounds like (laughs), (coughs), etc.
  • Voice cloning through user-provided audio prompts
  • Python API for simple text-to-audio generation
  • Real-time performance on supported GPUs
  • Planned CLI tool, PyPI package, and quantized version
  • Licensed under Apache 2.0 with strict misuse policies

Project Samples

Project Activity

See All Activity >

Categories

AI Models

Follow Dia-1.6B

Dia-1.6B Web Site

Other Useful Business Software
Go From AI Idea to AI App Fast Icon
Go From AI Idea to AI App Fast

One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Try Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Dia-1.6B!

Additional Project Details

Programming Language

Python

Related Categories

Python AI Models

Registered

2025-06-27