Skip to main content
Also known as speech-to-text (STT), transcription is the process of converting speech to text. We have built this endpoint with strong support across many languages.

Request

The transcribe() function can be used to transcribe audio. Pass the audio as content: file bytes, an unauthenticated audio URL, or a Spitch file UUID.

Parameters

  • content (required) - Audio file content, an audio file URL, or a Spitch file UUID
  • language (optional) - ISO 639 language code for the audio, such as en, yo, ha, ig, or am
  • model (deprecated) - Previous STT model selector. Accepted values are mansa_v1 and legacy, but new integrations should omit it.
  • special_words (optional) - Custom words to help with recognition accuracy
  • timestamp (optional) - Timestamp granularity: sentence or word
Use content for all input types. URLs must be directly accessible without authentication.

STT Models

The model parameter is deprecated in the current API. Omit it unless you are maintaining an older integration that still sends mansa_v1 or legacy.

Timestamp Options

The timestamp parameter controls the level of timing information returned:
  • sentence - Timestamps for each sentence
  • word - Timestamps for each individual word
Examples are provided below as a guide for you.

Best Practices for Use

  • Send audio through the content field as file bytes, a file UUID, or a public URL.
  • The maximum file size is 25MB, we will support larger sizes in the future.
  • We only support mp3, wav, m4a, and ogg file formats.
  • If you provide a URL in content, ensure that access to the file is not blocked by authentication.
  • The language field is optional. If you set it, use the language code (e.g. en, yo, ig) rather than the full language name.

Response

The response for transcription is JSON.
  • The Content-Type is application/json
  • A request_id is returned for issue resolution with our support team.
Below is an example of a response from the transcription endpoint.
    {
      "request_id": "86095cea-77d5-45ba-a093-0f800ac2c7df",
      "text": "Báwo ni olólùfẹ́ mi?",
      "segments": null
    }

Examples - file

import os
from spitch import Spitch

os.environ["SPITCH_API_KEY"] = "YOUR_API_KEY"
client = Spitch()

with open("new.wav", "rb") as f:
    response = client.speech.transcribe(
        language="yo",
        content=f.read(),
        timestamp="sentence"
    )
print(f"Text: {response.text}")

Examples - URL

import os
from spitch import Spitch

os.environ["SPITCH_API_KEY"] = "YOUR_API_KEY"
client = Spitch()

response = client.speech.transcribe(
    language="yo",
    content="https://myfilelocation.com/file.mp3",
    special_words="Spitch API"
)
print(response.text)
For error codes and retry guidance, see Troubleshooting.