> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spitch.app/llms.txt
> Use this file to discover all available pages before exploring further.

# Text to Speech

> Synthesizes text into speech and streams audio chunks as they are generated. The response body is binary audio, not a JSON envelope. Signed, silent provenance metadata is embedded in WAV, MP3, Ogg Opus, WebM Opus, and FLAC output. Raw PCM, mu-law, A-law, and HLS output do not carry provenance metadata.



## OpenAPI

````yaml https://api.spitch.app/openapi.json post /v1/speech
openapi: 3.1.0
info:
  title: api
  version: 0.1.1
servers:
  - url: https://api.spitch.app
    description: Production
security:
  - BearerAuth: []
tags:
  - name: speech
    description: Speech-to-text transcription and text-to-speech synthesis endpoints.
  - name: text
    description: Text translation and Yoruba diacritics endpoints.
  - name: voices
    description: Voice registration and voice metadata endpoints used by speech synthesis.
  - name: files
    description: File upload, download, listing, and storage-usage endpoints.
paths:
  /v1/speech:
    post:
      tags:
        - Speech
      summary: Text to Speech
      description: >-
        Synthesizes text into speech and streams audio chunks as they are
        generated. The response body is binary audio, not a JSON envelope.
        Signed, silent provenance metadata is embedded in WAV, MP3, Ogg Opus,
        WebM Opus, and FLAC output. Raw PCM, mu-law, A-law, and HLS output do
        not carry provenance metadata.
      operationId: synthesize_v1_speech_post
      requestBody:
        description: >-
          JSON synthesis request. The response is streamed binary audio, not
          JSON.
        content:
          application/json:
            schema:
              properties:
                text:
                  type: string
                  description: Text to synthesize into speech.
                voice:
                  type: string
                  description: >-
                    Voice ID to use for synthesis. Use a built-in system voice,
                    a registered custom voice, or blank.
                language:
                  type: string
                  description: >-
                    Optional ISO 639 language code. Yoruba input is diacritized
                    before synthesis.
                format:
                  type: string
                  enum:
                    - wav
                    - mp3
                    - ogg_opus
                    - webm_opus
                    - flac
                    - pcm_s16le
                    - mulaw
                    - alaw
                    - m3u8
                  description: >-
                    Output audio format. Silent signed provenance metadata is
                    available for WAV, MP3, Ogg Opus, WebM Opus, and FLAC, but
                    not raw PCM, mu-law, A-law, or HLS.
                  default: mp3
                bitrate:
                  type: string
                  enum:
                    - 32k
                    - 48k
                    - 64k
                    - 96k
                    - 128k
                    - 192k
                  description: Output bitrate for compressed formats.
                  default: 128k
                sample_rate:
                  type: integer
                  enum:
                    - 24000
                  description: Output sample rate in hertz. Only 24000 is accepted.
                  default: 24000
                speed:
                  type: number
                  maximum: 1.2
                  minimum: 0.7
                  description: Speech speed multiplier.
                  default: 1
              type: object
              required:
                - text
                - voice
        required: true
      responses:
        '200':
          description: >-
            Chunked streaming audio. The media type depends on the requested
            format; formats without a mapped media type are returned as
            application/octet-stream.
          headers:
            Content-Disposition:
              description: >-
                Inline attachment filename using the generated request ID and
                output format.
              schema:
                type: string
            Transfer-Encoding:
              description: >-
                The service streams the audio response with chunked transfer
                encoding.
              schema:
                type: string
                enum:
                  - chunked
          content:
            audio/wav:
              schema:
                type: string
                format: binary
            audio/mpeg:
              schema:
                type: string
                format: binary
            audio/flac:
              schema:
                type: string
                format: binary
            application/vnd.apple.mpegurl:
              schema:
                type: string
                format: binary
            application/octet-stream:
              schema:
                type: string
                format: binary
        '400':
          description: Error response.
          content:
            application/json:
              schema:
                properties:
                  detail:
                    description: >-
                      Error details. This is usually a string, but validation
                      errors can be structured.
                type: object
        '500':
          description: Error response.
          content:
            application/json:
              schema:
                properties:
                  detail:
                    description: >-
                      Error details. This is usually a string, but validation
                      errors can be structured.
                type: object
      security:
        - BearerAuth: []
components:
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: JWT, API key, or guest token
      description: >-
        Authenticate with `Authorization: Bearer <token>`. The service accepts
        JWTs, API keys, and guest tokens through this bearer token header.

````