Use readable expression tags
Product teams can write tone directly in the text, so prompts stay understandable during review and testing.
API for product teams
TextToSpeechSkills gives developers one practical workflow for turning app text into consistent speech: validate expression tags, choose a saved voice template, create a job, then poll or receive a webhook when audio is ready.
TextToSpeechSkills is a text-to-speech API for developers who need expressive speech in LLM apps, agents, and product workflows. It turns text into speech with readable expression tags, reusable voice templates, async generation jobs, scoped API keys, webhooks, MCP tools, and usage controls. Teams can start in the browser UI, then move the same workflow into code or an LLM app. This makes it useful for apps that need consistent narration, customer replies, educational audio, product updates, game dialogue, and automated voice output without exposing engine choices or secrets to end users.
Easy LLM setup
LLM setup is a copy-and-paste flow: create a scoped key, copy the MCP command, paste it into your LLM app, and tell the agent which voice template to use.
Read setup guideProduct teams can write tone directly in the text, so prompts stay understandable during review and testing.
Voice templates store persona, pacing, warmth, and stability settings so repeated prompts do not drift.
The API returns predictable job states for queued and fast paths, which keeps frontend clients clean and avoids leaking server credentials.
Developers building agent workflows, product narration, and speech features usually need a repeatable path for writing, review, generation, billing, and reuse. The most important jobs here are use readable expression tags, keep voices consistent, build around jobs, not waiting. Those are the moments where voice becomes part of real work instead of a one-off export.
Start with readable text, add expression tags when tone matters, choose an approved voice template, and create a speech job through the UI, API, or MCP. The same pattern works for text-to-speech API, TTS API for developers, speech API for apps, which makes it easier for humans and LLM apps to share one process without exposing internal routing or credentials.
Decide which templates are approved, which expression tags are allowed, who can create workspace keys, and which usage limits are acceptable. Those choices keep automated voice generation useful without letting it sprawl from the first paid Test plan through Pro, Scale, and Business usage.
Common questions
These are the practical details that matter before a team adds speech generation to a real workflow.
Developers building agent workflows, product narration, and speech features should use this page when they want generated speech that is easy to review, consistent across prompts, and simple to connect to LLM tools. The core workflow combines expression tags, voice templates, credit previews, and job-based generation.
LLM setup is a copy-and-paste flow: create a scoped key, copy the MCP command, paste it into your LLM app, and tell the agent which voice template to use. The setup guide keeps the first path short while still giving developers a clean API when the workflow moves into a product backend.
Every paid plan uses credits. Teams can add credit packs when needed, and workspaces on Pro and higher add central billing for $2 per user per month.
API playground
{
"text": "[quiet] hello. [loud and angry] how are you?",
"voice_template": "vt_calm_narrator_v1",
"generation_mode": "instant",
"format": "mp3"
}MCP install
pnpm --package texttospeechskills dlx tts-skills-mcppnpm --package texttospeechskills dlx tts-skills-mcppnpm --package texttospeechskills dlx tts-skills-mcppnpm --package texttospeechskills dlx tts-skills tags