Text-to-Speech API for LLM Apps

Who is this for?

TextToSpeechSkills is a text-to-speech API for developers who need expressive speech in LLM apps, agents, and product workflows. It turns text into speech with natural-language bracketed expression directions, reusable voice templates, async generation jobs, scoped API keys, webhooks, MCP tools, and usage controls. Teams can start in the browser UI, then move the same workflow into code or an LLM app. This makes it useful for apps that need consistent narration, customer replies, educational audio, product updates, game dialogue, and automated voice output without exposing engine choices or secrets to end users.

Easy LLM setup

LLM-ready even for non-technical teams

LLM setup is a copy-and-paste flow: create a scoped key, copy the MCP command, paste it into your LLM app, and tell the agent which voice template to use.

Read setup guide

01Create a scoped key

02Install MCP

03Choose a voice template

04Generate audio from chat

Use natural expression markup

Product teams can write full delivery directions directly in brackets, so prompts stay understandable during review and testing.

Price API usage with the same credits

The same paid plan credits cover studio, API, and MCP usage, with credit previews before speech jobs run.

Keep voices consistent

Voice templates store persona, pacing, warmth, and stability settings so repeated prompts do not drift.

Build around jobs, not waiting

The API returns predictable job states, which keeps frontend clients clean and avoids leaking server credentials.

Pair with speech-to-text when agents listen

TextToSpeechSkills handles speech output from approved text. If an agent also receives user audio, pair it with a speech-to-text layer and send the response text here for voice output.

When this helps

Developers building agent workflows, product narration, and speech features usually need a repeatable path for writing, review, generation, billing, and reuse. The most important jobs here are use natural expression markup, price api usage with the same credits, keep voices consistent, build around jobs, not waiting, pair with speech-to-text when agents listen. Those are the moments where voice becomes part of real work instead of a one-off export.

How the workflow works

Start with readable text, add natural-language expression directions when tone matters, choose an approved voice template, and create a speech job through the UI, API, or MCP. The same pattern works for text-to-speech API, TTS API for developers, speech API for apps, which makes it easier for humans and LLM apps to share one process without exposing internal routing or credentials.

Before you roll it out

Decide which templates are approved, how natural expression markup should be reviewed, who can create workspace keys, and which usage limits are acceptable. Those choices keep automated voice generation useful without letting it sprawl from the first paid Test plan through Pro, Scale, and Business usage.

Common questions

What teams usually ask before starting

These are the practical details that matter before a team adds speech generation to a real workflow.

Who should use Text-to-Speech API for LLM Apps?

Developers building agent workflows, product narration, and speech features should use this page when they want generated speech that is easy to review, consistent across prompts, and simple to connect to LLM tools. The core workflow combines natural expression markup, voice templates, credit previews, and job-based generation.

Can a non-technical user connect this to an LLM app?

LLM setup is a copy-and-paste flow: create a scoped key, copy the MCP command, paste it into your LLM app, and tell the agent which voice template to use. The setup guide keeps the first path short while still giving developers a clean API when the workflow moves into a product backend.

How does pricing stay predictable?

Every paid plan uses credits. Teams can add credit packs when needed, and workspaces on Pro and higher add central billing for $2 per user per month.

Keep exploring TextToSpeechSkills

Use these guides to move from a first audio test to a repeatable workflow for your team.

API playground

Plain JSON in, speech job out

{
  "text": "[quiet] hello. [loud and angry] how are you?",
  "voice_template": "vt_calm_narrator_v1",
  "format": "wav"
}

Job created200 audio ready

MCP install

Agent tools included at launch

Claude Desktopnpx --yes --package texttospeechskills tts-skills-mcp

Codexnpx --yes --package texttospeechskills tts-skills-mcp

Cursornpx --yes --package texttospeechskills tts-skills-mcp

Skills helpernpx --yes --package texttospeechskills tts-skills tags

The public package includes the MCP server, skill instructions, SDK, CLI, OpenAPI file, resources, and prompts.