
Speech Central has recently expanded its features to include support for open-source Text-to-Speech (TTS) engines. This new functionality allows users to integrate TTS engines that are compatible with the OpenAI API, which Speech Central has already supported for some time.
Why Open-Source TTS Engines?
The addition of open-source TTS engines to Speech Central opens up exciting possibilities for users who want more control over their costs. For instance, open-source TTS engines can help significantly reduce your TTS-related expenses by requiring you to only cover your server costs, which are much lower compared to the fees for using commercial TTS solutions like OpenAI’s premium voices. If you use a local device as a server then this comes with almost no associated costs.
While these open-source voices may not always reach the same level of naturalness as OpenAI’s offerings, they still represent a notable improvement over the built-in voices found on iPhone and Mac devices. This gives users a cost-efficient alternative without compromising too much on voice quality.
Setting Up Open-Source TTS
One example of an open-source TTS engine you can integrate with Speech Central is openedai-speech. It provides a way to generate speech using models that align with OpenAI’s API. It is based on Coqui AI’s xtts_v2 and/or piper tts as the backend. For use in Speech Central you are likely to pick xtts_v2 as piper may not offer significant quality improvement over the iPhone built-in voices.
Another example is Kokoro that features OpenAI’s API implemented in Kokoro Web package.
However, it’s worth noting that setting up and maintaining a server for these open-source TTS engines isn’t a simple task. For users who are comfortable managing server infrastructure, this option offers an attractive way to control expenses while maintaining quality. But for others, relying on commercial APIs like OpenAI’s might remain the most practical choice. Please note that if you use your device it must meet some hardware requirements for the service that you use to generate the sound in real time – otherwise you will experience gaps in the speech while the app is waiting for the sound to be generated.
Finally when you setup the server connecting it to Speech Central is easy. Go to Settings > Speech > Voices > Toolbar menu button > OpenAI and ensure to set the Custom URL for your server. Please note that OpenAI has only 6 predefined voices and as such it doesn’t provide the API to pull voices from the server. Your emulation will work by mapping of open source voices to one of those voices.
Speech Central: TTS for Everyone
At its core, Speech Central is about accessibility. By providing support for both commercial and open-source TTS engines, it gives users the flexibility to choose the best solution for their needs, whether it’s optimizing cost or leveraging the latest in AI-generated speech.
With this update, Speech Central continues to be a versatile platform for anyone looking to enhance their reading experience with TTS.