Skip to main content
POST
/
v1
/
speech
Text to Speech
curl --request POST \
  --url https://api.spitch.app/v1/speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "text": "<string>",
  "voice": "<string>"
}
'
"<string>"

Authorizations

Authorization
string
header
required

Authenticate with Authorization: Bearer <token>. The service accepts JWTs, API keys, and guest tokens through this bearer token header.

Body

application/json

JSON synthesis request. The response is streamed binary audio, not JSON.

text
string
required

Text to synthesize into speech.

voice
string
required

Voice ID to use for synthesis. Use a built-in system voice, a registered custom voice, or blank.

language
string

Optional ISO 639 language code. Yoruba input is diacritized before synthesis.

format
enum<string>
default:mp3

Output audio format. Silent signed provenance metadata is available for WAV, MP3, Ogg Opus, WebM Opus, and FLAC, but not raw PCM, mu-law, A-law, or HLS.

Available options:
wav,
mp3,
ogg_opus,
webm_opus,
flac,
pcm_s16le,
mulaw,
alaw,
m3u8
bitrate
enum<string>
default:128k

Output bitrate for compressed formats.

Available options:
32k,
48k,
64k,
96k,
128k,
192k
sample_rate
enum<integer>
default:24000

Output sample rate in hertz. Only 24000 is accepted.

Available options:
24000
speed
number
default:1

Speech speed multiplier.

Required range: 0.7 <= x <= 1.2

Response

Chunked streaming audio. The media type depends on the requested format; formats without a mapped media type are returned as application/octet-stream.

The response is of type file.