Dia TTS Server | Text-to-Dialogue

No predefined voices found in './voices' directory.

Generate Speech with Dia

Text to speak

Use [S1]/[S2] for speakers, (laughs), (sighs) etc. for non-verbals. Prepend reference transcript for cloning.

0 / 8192

Split text into chunks

Chunk Size: 120

Splitting is automatically disabled if text length is less than 2x Chunk Size. Recommended size: ~100-300 (Default: 120). Make sure that you use Predefined Voices or Voice Cloning mode to ensure voices remain consistent for each chunk.

Voice Mode:

Predefined Voices Voice Cloning (Reference) Random Single / Dialogue

Load Example Preset:

Generation Parameters

Speed Factor (1.0)

CFG Scale (3.0)

Temperature (1.3)

Top P (0.95)

CFG Filter Top K (35)

Generation Seed

Integer (like 1, 42, 901...) for reproducible results, or -1 for random seeds.

Server Configuration

These settings are saved to config.yaml. Restart the server to apply changes to Server, Model, or Paths sections.

Server Host

Server Port

Model Repo ID

Model Config Filename

Model Weights Filename

Whisper Model Name

Model Cache Path

Reference Audio Path

Predefined Voices Path

Output Path

Reset all settings to their defaults from .env file.

Tips & Tricks for Dia

Use **Predefined Voices** for consistent, high-quality output based on provided samples.
For **Voice Clone**, upload clean reference audio (.wav/.mp3). Crucially, save the exact transcript of the reference audio as the .txt file with the same name as the audio file. [S1] First speaker [S2] Second speaker or [S1] First speaker if the reference audio file has only one speaker.
Use **Random / Dialogue** for multi-speaker text ([S1]/[S2]) or single-speaker generation without cloning.
Experiment with **CFG Scale** (higher = more adherence) and **Temperature** (higher = more varied).
Use **Generation Seed** integer values like 1, 42, 901... for reproducible results.
Enable **Split text** for long inputs (> ~200-300 chars). Note: Using Random/Dialogue mode with splitting and a random seed (-1) may result in different voices per chunk. Use Predefined/Clone or a fixed seed for consistency across chunks.
Use the /v1/audio/speech endpoint for OpenAI compatibility.
Use the custom /tts endpoint for maximum flexibility and configuring all Dia generation parameters, passing reference audio and transcript information etc.

Generate Speech with Dia

Tips & Tricks for Dia

Chunking Voice Consistency Warning

Generation Quality Notice