Skip to main content
Voices are how we generate speech. Selecting the right voice can help you convey the tone and emotions you desire. Spitch gives you two kinds of voices:
  • System voices — a curated roster of production-ready voices you can use right away.
  • Custom voices — your own voices, created from a short reference recording and usable anywhere a system voice is.
Voices are not tied to a single language: every voice can speak any supported language, so you can pair any voice with any language you need. Pass a voice’s ID — the name shown on each system card, or the voice_id returned when you create a custom voice — as the voice parameter in the Text to Speech API.

Custom voices

You can register your own voice from a short reference recording, then use it anywhere you’d use a system voice. Custom voices are private to your account, and you can delete them at any time.

Create a voice

Send a multipart/form-data request to POST /v1/voices with a reference recording, its transcript, a display name, and explicit consent.
cURL
curl -X POST https://api.spitch.app/v1/voices \
  -H "Authorization: Bearer $SPITCH_API_KEY" \
  -F audio=@reference.wav \
  -F transcript="Bawo ni, orúkọ mi ni Tunde." \
  -F name="Tunde" \
  -F consent=true \
  -F language=yo
Parameters
  • audio (required) — reference recording of the speaker. Single speaker, clear speech, 2–30 seconds, up to 10 MB. It should match the transcript.
  • transcript (required) — exact transcript of the audio, up to 2000 characters.
  • name (required) — display name for the voice, up to 120 characters.
  • consent (required) — must be true to confirm you have the speaker’s permission to clone this voice.
  • language (optional) — ISO 639 language code for the reference audio.
The response returns the new voice, including the voice_id you’ll use to generate speech:
{
  "voice_id": "voice_3f9c1a7b8e2d4f06a1c5d9b2e7f04a8c",
  "kind": "custom",
  "status": "ready",
  "name": "Tunde",
  "language": "yo",
  "created_at": "2026-06-19T10:12:04.512000+00:00",
  "updated_at": "2026-06-19T10:12:04.512000+00:00"
}
Use that voice_id as the voice parameter exactly like a system voice:
cURL
curl -X POST https://api.spitch.app/v1/speech \
  -H "Authorization: Bearer $SPITCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "language": "yo",
    "voice": "voice_3f9c1a7b8e2d4f06a1c5d9b2e7f04a8c",
    "text": "Mo ń kọ ohun tí mo fẹ́ sọ."
  }' --output output.wav
Only register audio you have the right to use. consent must be true, and each account can hold up to 100 custom voices.

List voices

GET /v1/voices returns the system voices plus your account’s custom voices.
cURL
curl https://api.spitch.app/v1/voices \
  -H "Authorization: Bearer $SPITCH_API_KEY"
Fetch a single voice’s metadata with GET /v1/voices/{voice_id}.

Delete a voice

DELETE /v1/voices/{voice_id} removes one of your custom voices and deletes its stored reference audio. System voices cannot be deleted.
cURL
curl -X DELETE https://api.spitch.app/v1/voices/voice_3f9c1a7b8e2d4f06a1c5d9b2e7f04a8c \
  -H "Authorization: Bearer $SPITCH_API_KEY"
{ "deleted": true, "voice_id": "voice_3f9c1a7b8e2d4f06a1c5d9b2e7f04a8c" }

System voices

Built-in voices, ready to use. Use a card’s name as the voice ID. The headings group voices by the language they were originally designed around.

Amharic

Hana

Feminine

Haile

Masculine

Tesfaye

Masculine

Tena

Feminine

English

John

Masculine

Lucy

Feminine

Lina

Feminine

Jude

Masculine

Henry

Masculine

Kani

Feminine

Kingsley

Masculine

Remi

Feminine

Hausa

Hasan

Masculine

Amina

Feminine

Zainab

Feminine

Aliyu

Masculine

Igbo

Obinna

Masculine

Ngozi

Feminine

Amara

Feminine

Ebuka

Masculine

Yoruba

Sade

Feminine

Funmi

Feminine

Segun

Masculine

Femi

Masculine

Nigerian Pidgin

Justice

Masculine

Boma

Feminine

Tega

Masculine

Ufoma

Feminine