Voices - Spitch

Voices are how we generate speech. Selecting the right voice can help you convey the tone and emotions you desire. Spitch gives you two kinds of voices:

System voices — a curated roster of production-ready voices you can use right away.
Custom voices — your own voices, created from a short reference recording and usable anywhere a system voice is.

Voices are not tied to a single language: every voice can speak any supported language, so you can pair any voice with any language you need. Pass a voice’s ID — the name shown on each system card, or the voice_id returned when you create a custom voice — as the voice parameter in the Text to Speech API.

Custom voices

You can register your own voice from a short reference recording, then use it anywhere you’d use a system voice. Custom voices are private to your account, and you can delete them at any time.

Create a voice

Send a multipart/form-data request to POST /v1/voices with a reference recording, its transcript, a display name, and explicit consent.

cURL

curl -X POST https://api.spitch.app/v1/voices \
  -H "Authorization: Bearer $SPITCH_API_KEY" \
  -F audio=@reference.wav \
  -F transcript="Bawo ni, orúkọ mi ni Tunde." \
  -F name="Tunde" \
  -F consent=true \
  -F language=yo

Parameters

audio (required) — reference recording of the speaker. Single speaker, clear speech, 2–30 seconds, up to 10 MB. It should match the transcript.
transcript (required) — exact transcript of the audio, up to 2000 characters.
name (required) — display name for the voice, up to 120 characters.
consent (required) — must be true to confirm you have the speaker’s permission to clone this voice.
language (optional) — ISO 639 language code for the reference audio.

The response returns the new voice, including the voice_id you’ll use to generate speech:

{
  "voice_id": "voice_3f9c1a7b8e2d4f06a1c5d9b2e7f04a8c",
  "kind": "custom",
  "status": "ready",
  "name": "Tunde",
  "language": "yo",
  "created_at": "2026-06-19T10:12:04.512000+00:00",
  "updated_at": "2026-06-19T10:12:04.512000+00:00"
}

Use that voice_id as the voice parameter exactly like a system voice:

cURL

curl -X POST https://api.spitch.app/v1/speech \
  -H "Authorization: Bearer $SPITCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "language": "yo",
    "voice": "voice_3f9c1a7b8e2d4f06a1c5d9b2e7f04a8c",
    "text": "Mo ń kọ ohun tí mo fẹ́ sọ."
  }' --output output.wav

Only register audio you have the right to use. consent must be true, and each account can hold up to 100 custom voices.

List voices

GET /v1/voices returns the system voices plus your account’s custom voices.

cURL

curl https://api.spitch.app/v1/voices \
  -H "Authorization: Bearer $SPITCH_API_KEY"

Fetch a single voice’s metadata with GET /v1/voices/{voice_id}.

Delete a voice

DELETE /v1/voices/{voice_id} removes one of your custom voices and deletes its stored reference audio. System voices cannot be deleted.

cURL

curl -X DELETE https://api.spitch.app/v1/voices/voice_3f9c1a7b8e2d4f06a1c5d9b2e7f04a8c \
  -H "Authorization: Bearer $SPITCH_API_KEY"

{ "deleted": true, "voice_id": "voice_3f9c1a7b8e2d4f06a1c5d9b2e7f04a8c" }

System voices

Built-in voices, ready to use. Use a card’s name as the voice ID. The headings group voices by the language they were originally designed around.

Amharic

Hana

Feminine

Haile

Masculine

Tesfaye

Masculine

Tena

Feminine

English

John

Masculine

Lucy

Feminine

Lina

Feminine

Jude

​Custom voices

​Create a voice

​List voices

​Delete a voice

​System voices

​Amharic

Hana

Haile

Tesfaye

Tena

​English

John

Lucy

Lina

Jude

Henry

Kani

Kingsley

Remi

​Hausa

Hasan

Amina

Zainab

Aliyu

​Igbo

Obinna

Ngozi

Amara

Ebuka

​Yoruba

Sade

Funmi

Segun

Femi

​Nigerian Pidgin

Justice

Boma

Tega

Ufoma

Custom voices

Create a voice

List voices

Delete a voice

System voices

Amharic

English

Hausa

Igbo

Yoruba

Nigerian Pidgin