# Kubeez REST API

> Public HTTP API for AI media generation: image, video, music, speech, captions, audio separation, and ad creatives. Authenticate with API keys (`sk_live_...`). Same models, capabilities, and billing as the Kubeez web app and the MCP server.
>
> If you are an AI agent that needs to automate against this API, read this whole file once. It is self-contained: every endpoint, every workflow, every error code an agent typically needs is below. The two most useful machine-readable surfaces are `/docs/models.json` (live catalog of every model) and `/openapi.json` (OpenAPI 3.1 schema).

## Public, no-auth discovery URLs (safe to scrape)
- https://api.kubeez.com/llms.txt — this file
- https://api.kubeez.com/docs/models.json — full live catalog: every enabled model with `model_id`, `model_type`, `generation_types`, `capabilities`, `cost_per_generation`, `cost_note`, `usage_notes`, `voice_allowlist` (TTS only)
- https://api.kubeez.com/docs/models/{model_id}.json — single model row by id (also accepts a family prefix, e.g. `seedance-2` returns all variants)
- https://api.kubeez.com/openapi.json — OpenAPI 3.1 spec (request/response schemas for every `/v1/*` endpoint)
- https://api.kubeez.com/docs — interactive HTML reference (SPA; fetches `/docs/models.json` client-side)
- https://api.kubeez.com/docs/openapi — full Scalar OpenAPI viewer
- https://api.kubeez.com/health — liveness probe (`{"status":"healthy"}`)

## Authentication

Send one of these headers on every request to `/v1/*`:
```
X-API-Key: sk_live_...
Authorization: Bearer sk_live_...
```
Create a key at https://kubeez.com/settings/api-keys. Keys never expire unless revoked. Each key has a scope set; missing scope returns HTTP 403.

Scopes: `generate:media`, `generate:music`, `generate:speech`, `generate:ads`, `read:balance`, `read:generations`.

NEVER put the API key in a URL query string — only in headers. The server logs request paths but redacts headers.

## Endpoints (every public route)

Generation (async — returns a job id, then poll):
- `POST /v1/generate/media`        — start an image or video generation. Required: `model`. Common: `prompt`, `aspect_ratio`, `duration`, `source_media_urls`, `sound`, `resolution`, `quality`. Per-model param shape lives in `/docs/models.json` → `capabilities`. Returns `{ generation_id, status }`.
- `POST /v1/generate/media/extend` — extend an existing Veo 3.1 video by feeding its last frame back in. Body: `{ source_task_id, prompt, extend_model? }` (`extend_model` ∈ `lite|fast|quality`, default `fast`). `source_task_id` comes from the source clip's `GET /v1/generate/media/{id}` response (only populated for completed Veo 3.1 generations). Source must be your own Veo 3.1 generation, not itself an extend, and not a 1080p output. Flat price: lite=45, fast=75, quality=300 credits. Returns `{ generation_id, status, source_task_id, estimated_cost_credits }`; poll `/v1/generate/media/{generation_id}` until `completed`.
- `POST /v1/generate/music`        — start a music generation. Body: `{ model, prompt }` (simple) OR `{ model, title, style, lyrics|song_description, vocal_gender }` (advanced). Returns `{ generation_id, status }`.
- `POST /v1/generate/dialogue`     — single-voice TTS. Body: `{ text, voice, stability?, similarity_boost?, style?, speed?, language_code? }`. `voice` must be one of the 26 names in `/docs/models.json` for `text-to-dialogue-v3`. Returns `{ generation_id, status }`.
- `POST /v1/generate/captions`     — transcribe a video to word-level timestamped JSON. Body: `{ media_url, quality, language?, code_switching? }`. Synchronous — no polling.
- `POST /v1/generate/separation`   — split an audio track into vocals + instrumental. Body: `{ media_url }`. Returns `{ separation_id, status }`. Poll `/v1/generate/separation/{id}`.
- `POST /v1/generate/ad-copy`      — generate ad creatives. Body: `{ reference_ad_url, product_image_url?, product_text?, variant_count?, aspect_ratio?, language? }`. Returns `{ project_id, generation_ids[] }`. Poll each id via `/v1/generate/media/{id}`.

Polling (call until `status === "completed"`):
- `GET /v1/generate/media/{id}`       — image/video/dialogue (yes — dialogue jobs poll here, not under /music)
- `GET /v1/generate/music/{id}`       — music
- `GET /v1/generate/separation/{id}`  — audio separation

Uploads (only needed when the user has a local file, not a public URL):
- `POST /v1/upload/media` — multipart/form-data, field `file`, max 500 MB per request. Returns `{ success, urls[], uploaded }`. The returned URLs are signed CDN links you can drop straight into `source_media_urls` or `media_url`.

Asset Library (persistent named media — saves an upload across sessions):
- `GET    /v1/assets`            — list user's library. Returns `{ assets[], count, quota_bytes, used_bytes }`. Each asset has `id`, `name`, `kind`, `mime_type`, `size_bytes`, `url` (1 h signed), `url_expires_at`.
- `POST   /v1/assets`            — server-side ingest of a public URL. Body: `{ name, url }`. `name` is a stable handle (lowercase letters/digits/`_`/`-`, max 64 chars, unique). 500 MB per file, 50 MB per-user default quota.
- `PATCH  /v1/assets/{id}`       — rename. Body: `{ name }`.
- `DELETE /v1/assets/{id}`       — permanent.

Account / catalog (read-only):
- `GET /v1/models`                  — same payload as `/docs/models.json` but auth-gated. Optional `?model_type=image|video|music|speech`.
- `GET /v1/balance`                 — `{ credits, message }`.
- `GET /v1/generations`             — recent jobs. Optional `?status=completed`, `?model=...`, `?generation_type=...`.

## Canonical workflows for an automating agent

### A. Pick a model, then generate, then poll

```bash
# 1. Find an enabled model.
curl -s https://api.kubeez.com/docs/models.json | jq '.models[] | select(.model_type=="image") | .model_id' | head

# 2. Start the job.
curl -s -X POST https://api.kubeez.com/v1/generate/media \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"nano-banana-2","prompt":"matte-black headphones on marble, soft studio light","aspect_ratio":"1:1"}'
# → {"generation_id":"abc-123-uuid","status":"queued"}

# 3. Poll until done. First poll: ~5-10s for images, ~30-60s for videos.
curl -s https://api.kubeez.com/v1/generate/media/abc-123-uuid \
  -H "X-API-Key: sk_live_..."
# → {"id":"...", "status":"completed", "outputs":[{"url":"https://...","media_type":"image"}]}
```

A correct polling loop: backoff `[5s, 5s, 5s, 10s, 10s, 15s, 15s, 30s, 30s, ...]` capped at 30s. Stop on `status` ∈ `completed`, `failed`, `cancelled`. Most images finish in 10-30s; videos in 30-180s; music in 60-180s; separation in 60-300s.

### B. User has a local file → upload, then use it as a reference

```bash
# 1. Upload (max 500 MB).
curl -s -X POST https://api.kubeez.com/v1/upload/media \
  -H "X-API-Key: sk_live_..." \
  -F "file=@./photo.jpg"
# → {"success":true,"urls":["https://storage.../photo.jpg"],"uploaded":1}

# 2. Use the returned URL as source_media_urls.
curl -s -X POST https://api.kubeez.com/v1/generate/media \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"nano-banana-2","prompt":"replace background with sunset","aspect_ratio":"1:1","source_media_urls":["https://storage.../photo.jpg"],"generation_type":"image-to-image"}'
```

### C. Reuse a brand asset across many generations (logo, recurring character, voiceover)

```bash
# 1. Save once.
curl -s -X POST https://api.kubeez.com/v1/assets \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"name":"acme-logo","url":"https://cdn.example.com/logo.png"}'
# → {"asset":{"id":"a1b2c3","name":"acme-logo","url":"https://cdn.kubeez.com/...signed..."}}

# 2. Reuse forever via /v1/assets — each fetch returns a freshly signed URL.
curl -s https://api.kubeez.com/v1/assets -H "X-API-Key: sk_live_..." | jq '.assets[] | select(.name=="acme-logo") | .url'
```

The signed `url` field is good for 1 hour. Re-list when it expires (`url_expires_at`); the bytes themselves are persistent.

### D. Multi-speaker TTS

```bash
curl -s -X POST https://api.kubeez.com/v1/generate/dialogue \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"text":"Welcome to Kubeez. Let me show you around.","voice":"Rachel","stability":0.5,"language_code":"en"}'
# → poll /v1/generate/media/{generation_id}  (yes — dialogue uses the /media/ status path)
```

The 26 supported voices: Rachel, Drew, Clyde, Paul, Aria, Domi, Dave, Roger, Fin, Sarah, James, Jane, Juniper, Arabella, Hope, Bradford, Reginald, Gaming, Austin, Kuon, Blondie, Priyanka, Alexandra, Monika, Mark, Grimblewood. Authoritative list: `/docs/models.json` → `text-to-dialogue-v3` → `voice_allowlist`.

## Model-specific parameters

`/docs/models.json` is authoritative — read each model's `capabilities` before sending the body. Common fields:

- `aspect_ratio_options` — pick one. Some models reject `auto` / `1:1` at higher resolution tiers (gpt-image-2 at 2K/4K).
- `duration_options` — videos only. Some are flexible integer ranges (Kling 3.0: any 3-15s); most are presets.
- `resolution_options` — `1K|2K|4K` for image-tier models that have it; `480p|720p|1080p` for video.
- `quality_options` — `basic|high` (Seedream v4.5, 5-lite) or `fast|standard|ultra` (Imagen 4) or `720p|720p-draft|1080p|1080p-draft` (P-Video).
- `max_input_images`, `max_input_videos`, `max_input_audios` — limits on `source_media_urls` per type.
- `supports_sound` / `video_audio` — sound is `toggle_via_sound_param`, `included` (free), or `silent`.
- `supports_negative_prompt` — boolean.
- `prompt_max_chars` — cap on the `prompt` field.

## Cost contract

Two billing shapes:
- **Flat per-generation** — most images, music, dialogue, ad-copy, separation, captions. `cost_per_generation` is the credits charged. Fixed at submit time.
- **Per-second** — most videos (Kling 3.0, Seedance 2, P-Video, etc.). `cost_note` describes the formula. Charged on the actual output duration. Reference video attachments may add a `(ref_seconds + output_seconds) × rate` surcharge — `cost_note` calls this out per-model.

NEVER promise a refund. Kubeez does not refund failed or unsatisfactory generations. If a job 5xx's at the edge before any work happens, no credits are deducted; once the upstream provider runs, the credits are gone. Set caller expectations accordingly.

To preview cost before committing: read `cost_per_generation` + `cost_note` from `/docs/models.json`. The MCP server has an `estimate` tool that does this server-side; the REST API does not have a separate estimate endpoint — read the catalog yourself.

## Errors

Validation and business errors return JSON with `error` and `message`. Common codes:

- `400 invalid_request`        — schema or value mismatch (read `message`)
- `400 unsupported_aspect`     — aspect_ratio not in `aspect_ratio_options` for this model
- `400 missing_*`              — required field missing
- `401`                         — bad / missing API key
- `402 insufficient_credits`   — top up at https://kubeez.com/billing
- `403 missing_scope`           — your key needs the listed scope; rotate one with the right scope
- `404 model_not_found`         — `model_id` is disabled or unknown
- `404 not_found`               — generation/asset id wrong
- `409 name_taken`              — POST /v1/assets name collision
- `413 file_too_large`         / `quota_exceeded` — upload over 500 MB / library over 50 MB
- `429 rate_limit_exceeded`    — back off; response includes `retry_after`
- `502 fetch_failed`           — POST /v1/assets couldn't fetch your URL
- `5xx`                         — server side; safe to retry idempotently after a delay

Retry strategy: 5xx and 429 are retryable with exponential backoff (jitter recommended). 4xx is NOT retryable — fix the body first.

## Rate limits (default; per API key)

- Generate media: 30 req/min
- Generate music: 10 req/min
- Generate dialogue: 10 req/min
- Generate ad-copy: 5 req/min
- Upload media + add asset: 30 req/min
- All read endpoints (status polls, /v1/models, /v1/balance, /v1/generations, list/rename/delete asset): 120 req/min

429 responses include `Retry-After` and a `retry_after` field in the body.

## Tips for AI agents

1. Always read `/docs/models.json` first. The model catalog changes — don't memorize fields from this file; treat it as the source of truth.
2. For one-off automation, fetch `/docs/models/{model_id}.json` instead of the full catalog — same data, ~99% smaller payload.
3. Status polling: media + dialogue both poll `/v1/generate/media/{id}` (only music + separation have their own status routes). Don't poll faster than every 5s for the first 30s, then every 15-30s.
4. Prefer URL ingest (`POST /v1/assets` / passing public URLs in `source_media_urls`) over multipart uploads when the file is already on the public web.
5. Cross-origin: every public route sends permissive CORS headers; you can call from a browser.
6. The interactive `/docs` page is a SPA — its content is JS-rendered and not visible to a plain `curl`. Use `/docs/models.json` and `/openapi.json` for machine ingestion; use this `/llms.txt` for narrative guidance.

## Related
- [MCP server](https://mcp.kubeez.com/docs): same models exposed as MCP tools for Claude Desktop, Cursor, and any MCP-compatible client. Same auth scopes.
- [Kubeez web app](https://kubeez.com): human UI for the same generations, plus billing and key management.