Qwen3 TTS: Advanced Open-Source Voices for Speech Central

Speech Central recently expanded its Text-to-Speech features by adding support for open-source TTS engines, bringing the state of the art voices that you can use on your device (iPhone, Android, Mac). The key requirement is simple: if a TTS server is compatible with the OpenAI API, Speech Central can connect to it using the same integration workflow that many users already know from OpenAI support.

In this follow-up post, I’ll focus on Qwen3-TTS: what it brings to the table, why it’s a strong fit for long-form reading, and how to connect it to Speech Central using an OpenAI-compatible endpoint.

Why Qwen3-TTS Is Interesting

Many open-source TTS solutions are good enough for short demos but struggle with the things that matter during long listening sessions: stable pacing, consistent pronunciation, and fewer “weird” prosody shifts across paragraphs. Qwen3-TTS is built with real-time and long-form audio in mind, which makes it especially relevant for reading apps.

  • Better long-form consistency (pacing and sentence-to-sentence flow)
  • Streaming-friendly generation for continuous playback
  • Multilingual support (varies by model/voice pack)
  • Voice cloning / voice design (depending on your setup)
  • Can be served behind OpenAI-compatible APIs using a wrapper or gateway

The key point for Speech Central users is that you don’t need Speech Central to “natively support Qwen.” You just need a server that exposes an OpenAI-compatible TTS API and maps voices in a predictable way.

How the OpenAI Compatibility Layer Works

Speech Central talks to OpenAI-style TTS endpoints using the standard request/response format (for example, an endpoint like /v1/audio/speech). Qwen3-TTS itself is a model family — it usually needs a small serving layer to expose a web API.

Common approaches include:

  • FastAPI wrappers that emulate OpenAI’s TTS routes
  • vLLM-based serving (where available) with OpenAI-like client compatibility
  • Other gateways that translate OpenAI requests into Qwen3 inference calls

Once that wrapper is running, Speech Central can use it the same way it uses OpenAI.

Server Setup Notes (Practical Reality Check)

Running Qwen3-TTS in real time requires enough compute to generate audio faster than playback. If your server can’t keep up, you’ll hear gaps while Speech Central waits for new audio chunks.

  • Best experience: a GPU server (local or cloud)
  • Possible but variable: a powerful desktop CPU, small models, and short texts
  • Multiple devices: a dedicated server is usually smoother than hosting on a phone/laptop

If you’re testing locally and you hear frequent pauses, reduce latency by enabling streaming mode in your server, using a smaller model variant, or deploying on a machine with stronger GPU support.

Connecting Qwen3-TTS to Speech Central

After your Qwen3-TTS server is running behind an OpenAI-compatible endpoint, connecting it in Speech Central is straightforward:

  1. Open Settings
  2. Go to Speech → Voices
  3. Tap the toolbar menu button and choose OpenAI
  4. Set the Custom URL to your server (for example: http://192.168.1.10:8000)
  5. Enter an API key if your server requires one

Important: OpenAI provides a fixed set of predefined voice names and does not provide a “list voices from server” API for TTS. Because Speech Central follows that OpenAI pattern, your OpenAI-compatible wrapper typically needs to map your Qwen voices to one of the predefined OpenAI voice slots.

In practice that means you select, for example, “Alloy” in Speech Central, but your server translates that into “Qwen Voice A.”

Model Name Compatibility (When Audio Doesn’t Start)

Speech Central uses the latest OpenAI TTS model by default (the name may change over time). Some OpenAI-compatible wrappers expect older model identifiers.

If Speech Central connects but playback doesn’t start:

  • Try setting the model to tts-1 in the OpenAI configuration dialog
  • Check your server logs for rejected model names
  • Confirm that your wrapper supports the same request fields Speech Central sends

Some wrappers ignore the model field entirely and select the model/voice internally. If yours supports that, it can make Speech Central setup more resilient.

Qwen3-TTS vs Other Open-Source Options

Here’s where Qwen3-TTS typically sits in the open-source landscape:

  • XTTS v2 (Coqui-style): popular, flexible, strong multilingual cloning; often a solid baseline
  • Kokoro: lightweight deployments and OpenAI-style web serving packages
  • Qwen3-TTS: strong long-form prosody and streaming suitability; compelling for reading apps

If your main use case is listening to long articles, documents, or books, Qwen3-TTS is particularly worth testing because it emphasizes smoothness and stability over long passages.

Speech Central: Flexible TTS for Everyone

Speech Central’s direction here is clear: a modular voice stack. If an engine can be reached via an OpenAI-compatible TTS API, users can choose between commercial voices or self-hosted open-source voices depending on cost, privacy, and quality priorities.

Qwen3-TTS expands what “open-source voices” can feel like — especially for long-form reading — and it fits neatly into Speech Central’s OpenAI-compatible configuration model.

Download Speech Central