Dia 1.6B

TTS

Web UI

Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control

On-Demand Dedicated 1xRTX 4090

Details

Modalities

audio

Recommended Hardware

1xRTX 4090

Estimated Price

Provider

Nari Labs

Family

Dia

License

Apache 2.0

Dia 1.6B: Realistic Dialogue Generation from Text

Dia is a text-to-speech model developed by Nari Labs that directly generates highly realistic dialogue from transcripts. The model supports English language generation and enables emotion and tone control through audio conditioning.

Key Features

Dialogue Generation with Speaker Tags Dia produces natural speech from transcripts using [S1] and [S2] speaker tags, making it easy to create multi-speaker conversations directly from text.

Nonverbal Communication The model recognizes and generates approximately 20 different nonverbal expressions including laughter, coughing, throat clearing, sighing, and gasps. These are triggered using simple tags like "(laughs)", "(clears throat)", and "(sighs)".

Voice Cloning Dia includes voice cloning functionality that enables speaker consistency across generations. The model produces different voices with each generation without requiring fine-tuning on specific voices, and supports seed-fixing for reproducibility.

Audio Conditioning The model can be conditioned on audio input, enabling precise control over emotion and tone in the generated speech output.

Use Cases

Creating realistic dialogue for audio content and storytelling
Generating conversational speech with multiple speakers
Producing speech with emotional expressions and nonverbal sounds
Voice synthesis applications requiring speaker consistency
Accessibility tools for text-to-speech conversion

Training and Architecture

Dia draws inspiration from SoundStorm and Parakeet architectures, utilizing the Descript Audio Codec for audio generation. The model development benefited from resources provided by the Google TPU Research Cloud program and a Hugging Face ZeroGPU grant.