Speech to Text

curl --request POST \ --url https://api.spitch.app/v1/transcriptions \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: multipart/form-data' \ --form 'content=<string>' \ --form 0.content='@example-file'

{ "request_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a", "text": "<string>", "segments": [ { "text": "<string>", "start": 123, "end": 123 } ], "timestamps": [ { "text": "<string>", "start": 123, "end": 123 } ], "detected_language": "<string>" }

Authorizations

Authorization

string

header

required

Authenticate with Authorization: Bearer <token>. The service accepts JWTs, API keys, and guest tokens through this bearer token header.

Body

multipart/form-data

Multipart form data. Provide exactly one audio source: an uploaded file or a public URL.

Option 1
Option 2

content

file

required

Audio file to transcribe.

language

string

Optional ISO 639 language code for the spoken audio. Omit it to let the service use automatic language handling.

special_words

string

Optional comma-separated words to bias recognition toward domain-specific names or terms.

timestamp

enum<string>

default:none

Timestamp granularity for returned segments. Use none, sentence, or word.

Available options:

sentence,

word,

none

Response

Successful Response

request_id

string<uuid>

required

text

string

required

segments

Segment · object[] | null

Show child attributes

timestamps

Segment · object[] | null

Show child attributes

detected_language

string | null