Docs

Text-to-speech docs for API, MCP, skills, and secure setup

Use the same speech workflow from the browser UI, your product backend, or an LLM app. Start with scoped keys, validated expression tags, reusable voice templates, async speech jobs, and installable MCP tools.

Use jobs for every production workflow

Speech generation should not depend on a browser waiting for one long request. The docs explain how to create jobs, poll for status, receive audio URLs, and keep longer scripts in the background so your product UI stays responsive while the server handles billing, storage, retries, and delivery.

Keep LLM access narrow

The MCP and skills setup gives LLM apps focused tools for validating markup, listing approved templates, previewing credit use, creating jobs, and returning audio. That is enough for useful automation without giving a chat session broad account access or hidden credentials.

Make templates the stable contract

Instead of repeating subjective voice instructions in every API call, your app sends text plus a template ID. The docs cover how templates should be named, versioned, approved, and shared across workspaces so narrators, characters, support voices, and course instructors stay recognizable.

Ship with server-side safety

API keys, OAuth, payment state, private audio storage, service routing, and usage ledger updates belong on the backend. The public UI only needs scoped actions and safe configuration, which keeps setup easier for users and reduces the chance of accidental secret exposure.

Docs quickstart

Install, authenticate, generate

FetchPythoncURL
await fetch("https://texttospeechskills.com/api/v1/tts/jobs", {
  method: "POST",
  headers: {
    authorization: `Bearer ${process.env.TTS_API_KEY}`,
    "content-type": "application/json"
  },
  body: JSON.stringify({
    text: "[quiet] hello. [loud and angry] how are you?",
    voice_template: "vt_calm_narrator_v1",
    generation_mode: "instant"
  })
});

API keys

Scoped workspace access

Production keykey_••••••••••••
MCP keykey_••••••••••••

Keys are hashed at rest and never shown again after creation.

API playground

Plain JSON in, speech job out

{
  "text": "[quiet] hello. [loud and angry] how are you?",
  "voice_template": "vt_calm_narrator_v1",
  "generation_mode": "instant",
  "format": "mp3"
}
202 queued for polling200 audio ready

MCP install

Agent tools included at launch

Claude Desktoppnpm --package texttospeechskills dlx tts-skills-mcp
Codexpnpm --package texttospeechskills dlx tts-skills-mcp
Cursorpnpm --package texttospeechskills dlx tts-skills-mcp
Skills helperpnpm --package texttospeechskills dlx tts-skills tags

Safety

Production controls are built in

Keys, workspace access, private storage, and background generation are designed so teams can test quickly without opening up risky access.

Scoped access

Create keys for apps, workspaces, and LLM tools without sharing broad account access.

No wasted waiting

Longer generations run in the background, so users can poll or receive updates when audio is ready.

Controlled audio URLs

Generated audio is stored privately and served through controlled URLs.