# Kubeez REST API > Public HTTP API for AI media generation: image, video, music, speech, captions, audio separation, and ad creatives. Authenticate with API keys (`sk_live_...`). Same models, capabilities, and billing as the Kubeez web app and the MCP server. > > If you are an AI agent that needs to automate against this API, read this whole file once. It is self-contained: every endpoint, every workflow, every error code an agent typically needs is below. The two most useful machine-readable surfaces are `/docs/models.json` (live catalog of every model) and `/openapi.json` (OpenAPI 3.1 schema). ## Public, no-auth discovery URLs (safe to scrape) - https://api.kubeez.com/llms.txt — this file - https://api.kubeez.com/docs/models.json — full live catalog: every enabled model with `model_id`, `model_type`, `generation_types`, `capabilities`, `cost_per_generation`, `cost_note`, `usage_notes`, `voice_allowlist` (TTS only) - https://api.kubeez.com/docs/models/{model_id}.json — single model row by id (also accepts a family prefix, e.g. `seedance-2` returns all variants) - https://api.kubeez.com/openapi.json — OpenAPI 3.1 spec (request/response schemas for every `/v1/*` endpoint) - https://api.kubeez.com/docs — interactive HTML reference (SPA; fetches `/docs/models.json` client-side) - https://api.kubeez.com/docs/openapi — full Scalar OpenAPI viewer - https://api.kubeez.com/health — liveness probe (`{"status":"healthy"}`) ## Authentication Send one of these headers on every request to `/v1/*`: ``` X-API-Key: sk_live_... Authorization: Bearer sk_live_... ``` Create a key at https://kubeez.com/settings/api-keys. Keys never expire unless revoked. Each key has a scope set; missing scope returns HTTP 403. Scopes: `generate:media`, `generate:music`, `generate:speech`, `generate:ads`, `read:balance`, `read:generations`. NEVER put the API key in a URL query string — only in headers. The server logs request paths but redacts headers. ## Endpoints (every public route) Generation (async — returns a job id, then poll): - `POST /v1/generate/media` — start an image or video generation. Required: `model`. Common: `prompt`, `aspect_ratio`, `duration`, `source_media_urls`, `sound`, `resolution`, `quality`. Per-model param shape lives in `/docs/models.json` → `capabilities`. Returns `{ generation_id, status }`. - `POST /v1/generate/media/extend` — extend an existing Veo 3.1 video by feeding its last frame back in. Body: `{ source_task_id, prompt, extend_model? }` (`extend_model` ∈ `lite|fast|quality`, default `fast`). `source_task_id` comes from the source clip's `GET /v1/generate/media/{id}` response (only populated for completed Veo 3.1 generations). Source must be your own Veo 3.1 generation, not itself an extend, and not a 1080p output. Flat price: lite=45, fast=75, quality=300 credits. Returns `{ generation_id, status, source_task_id, estimated_cost_credits }`; poll `/v1/generate/media/{generation_id}` until `completed`. - `POST /v1/generate/music` — start a music generation. Body: `{ model, prompt }` (simple) OR `{ model, title, style, lyrics|song_description, vocal_gender }` (advanced). Returns `{ generation_id, status }`. - `POST /v1/generate/dialogue` — single-voice TTS. Body: `{ text, voice, stability?, similarity_boost?, style?, speed?, language_code? }`. `voice` must be one of the 26 names in `/docs/models.json` for `text-to-dialogue-v3`. Returns `{ generation_id, status }`. - `POST /v1/generate/captions` — transcribe a video to word-level timestamped JSON. Body: `{ media_url, quality, language?, code_switching? }`. Synchronous — no polling. - `POST /v1/generate/separation` — split an audio track into vocals + instrumental. Body: `{ media_url }`. Returns `{ separation_id, status }`. Poll `/v1/generate/separation/{id}`. - `POST /v1/generate/ad-copy` — generate ad creatives. Body: `{ reference_ad_url, product_image_url?, product_text?, variant_count?, aspect_ratio?, language? }`. Returns `{ project_id, generation_ids[] }`. Poll each id via `/v1/generate/media/{id}`. Polling (call until `status === "completed"`): - `GET /v1/generate/media/{id}` — image/video/dialogue (yes — dialogue jobs poll here, not under /music) - `GET /v1/generate/music/{id}` — music - `GET /v1/generate/separation/{id}` — audio separation Uploads (only needed when the user has a local file, not a public URL): - `POST /v1/upload/media` — multipart/form-data, field `file`, max 500 MB per request. Returns `{ success, urls[], uploaded }`. The returned URLs are signed CDN links you can drop straight into `source_media_urls` or `media_url`. Asset Library (persistent named media — saves an upload across sessions): - `GET /v1/assets` — list user's library. Returns `{ assets[], count, quota_bytes, used_bytes }`. Each asset has `id`, `name`, `kind`, `mime_type`, `size_bytes`, `url` (1 h signed), `url_expires_at`. - `POST /v1/assets` — server-side ingest of a public URL. Body: `{ name, url }`. `name` is a stable handle (lowercase letters/digits/`_`/`-`, max 64 chars, unique). 500 MB per file, 50 MB per-user default quota. - `PATCH /v1/assets/{id}` — rename. Body: `{ name }`. - `DELETE /v1/assets/{id}` — permanent. Account / catalog (read-only): - `GET /v1/models` — same payload as `/docs/models.json` but auth-gated. Optional `?model_type=image|video|music|speech`. - `GET /v1/balance` — `{ credits, message }`. - `GET /v1/generations` — recent jobs. Optional `?status=completed`, `?model=...`, `?generation_type=...`. ## Canonical workflows for an automating agent ### A. Pick a model, then generate, then poll ```bash # 1. Find an enabled model. curl -s https://api.kubeez.com/docs/models.json | jq '.models[] | select(.model_type=="image") | .model_id' | head # 2. Start the job. curl -s -X POST https://api.kubeez.com/v1/generate/media \ -H "X-API-Key: sk_live_..." \ -H "Content-Type: application/json" \ -d '{"model":"nano-banana-2","prompt":"matte-black headphones on marble, soft studio light","aspect_ratio":"1:1"}' # → {"generation_id":"abc-123-uuid","status":"queued"} # 3. Poll until done. First poll: ~5-10s for images, ~30-60s for videos. curl -s https://api.kubeez.com/v1/generate/media/abc-123-uuid \ -H "X-API-Key: sk_live_..." # → {"id":"...", "status":"completed", "outputs":[{"url":"https://...","media_type":"image"}]} ``` A correct polling loop: backoff `[5s, 5s, 5s, 10s, 10s, 15s, 15s, 30s, 30s, ...]` capped at 30s. Stop on `status` ∈ `completed`, `failed`, `cancelled`. Most images finish in 10-30s; videos in 30-180s; music in 60-180s; separation in 60-300s. ### B. User has a local file → upload, then use it as a reference ```bash # 1. Upload (max 500 MB). curl -s -X POST https://api.kubeez.com/v1/upload/media \ -H "X-API-Key: sk_live_..." \ -F "file=@./photo.jpg" # → {"success":true,"urls":["https://storage.../photo.jpg"],"uploaded":1} # 2. Use the returned URL as source_media_urls. curl -s -X POST https://api.kubeez.com/v1/generate/media \ -H "X-API-Key: sk_live_..." \ -H "Content-Type: application/json" \ -d '{"model":"nano-banana-2","prompt":"replace background with sunset","aspect_ratio":"1:1","source_media_urls":["https://storage.../photo.jpg"],"generation_type":"image-to-image"}' ``` ### C. Reuse a brand asset across many generations (logo, recurring character, voiceover) ```bash # 1. Save once. curl -s -X POST https://api.kubeez.com/v1/assets \ -H "X-API-Key: sk_live_..." \ -H "Content-Type: application/json" \ -d '{"name":"acme-logo","url":"https://cdn.example.com/logo.png"}' # → {"asset":{"id":"a1b2c3","name":"acme-logo","url":"https://cdn.kubeez.com/...signed..."}} # 2. Reuse forever via /v1/assets — each fetch returns a freshly signed URL. curl -s https://api.kubeez.com/v1/assets -H "X-API-Key: sk_live_..." | jq '.assets[] | select(.name=="acme-logo") | .url' ``` The signed `url` field is good for 1 hour. Re-list when it expires (`url_expires_at`); the bytes themselves are persistent. ### D. Multi-speaker TTS ```bash curl -s -X POST https://api.kubeez.com/v1/generate/dialogue \ -H "X-API-Key: sk_live_..." \ -H "Content-Type: application/json" \ -d '{"text":"Welcome to Kubeez. Let me show you around.","voice":"Rachel","stability":0.5,"language_code":"en"}' # → poll /v1/generate/media/{generation_id} (yes — dialogue uses the /media/ status path) ``` The 26 supported voices: Rachel, Drew, Clyde, Paul, Aria, Domi, Dave, Roger, Fin, Sarah, James, Jane, Juniper, Arabella, Hope, Bradford, Reginald, Gaming, Austin, Kuon, Blondie, Priyanka, Alexandra, Monika, Mark, Grimblewood. Authoritative list: `/docs/models.json` → `text-to-dialogue-v3` → `voice_allowlist`. ## Model-specific parameters `/docs/models.json` is authoritative — read each model's `capabilities` before sending the body. Common fields: - `aspect_ratio_options` — pick one. Some models reject `auto` / `1:1` at higher resolution tiers (gpt-image-2 at 2K/4K). - `duration_options` — videos only. Some are flexible integer ranges (Kling 3.0: any 3-15s); most are presets. - `resolution_options` — `1K|2K|4K` for image-tier models that have it; `480p|720p|1080p` for video. - `quality_options` — `basic|high` (Seedream v4.5, 5-lite) or `fast|standard|ultra` (Imagen 4) or `720p|720p-draft|1080p|1080p-draft` (P-Video). - `max_input_images`, `max_input_videos`, `max_input_audios` — limits on `source_media_urls` per type. - `supports_sound` / `video_audio` — sound is `toggle_via_sound_param`, `included` (free), or `silent`. - `supports_negative_prompt` — boolean. - `prompt_max_chars` — cap on the `prompt` field. ## Cost contract Two billing shapes: - **Flat per-generation** — most images, music, dialogue, ad-copy, separation, captions. `cost_per_generation` is the credits charged. Fixed at submit time. - **Per-second** — most videos (Kling 3.0, Seedance 2, P-Video, etc.). `cost_note` describes the formula. Charged on the actual output duration. Reference video attachments may add a `(ref_seconds + output_seconds) × rate` surcharge — `cost_note` calls this out per-model. NEVER promise a refund. Kubeez does not refund failed or unsatisfactory generations. If a job 5xx's at the edge before any work happens, no credits are deducted; once the upstream provider runs, the credits are gone. Set caller expectations accordingly. To preview cost before committing: read `cost_per_generation` + `cost_note` from `/docs/models.json`. The MCP server has an `estimate` tool that does this server-side; the REST API does not have a separate estimate endpoint — read the catalog yourself. ## Errors Validation and business errors return JSON with `error` and `message`. Common codes: - `400 invalid_request` — schema or value mismatch (read `message`) - `400 unsupported_aspect` — aspect_ratio not in `aspect_ratio_options` for this model - `400 missing_*` — required field missing - `401` — bad / missing API key - `402 insufficient_credits` — top up at https://kubeez.com/billing - `403 missing_scope` — your key needs the listed scope; rotate one with the right scope - `404 model_not_found` — `model_id` is disabled or unknown - `404 not_found` — generation/asset id wrong - `409 name_taken` — POST /v1/assets name collision - `413 file_too_large` / `quota_exceeded` — upload over 500 MB / library over 50 MB - `429 rate_limit_exceeded` — back off; response includes `retry_after` - `502 fetch_failed` — POST /v1/assets couldn't fetch your URL - `5xx` — server side; safe to retry idempotently after a delay Retry strategy: 5xx and 429 are retryable with exponential backoff (jitter recommended). 4xx is NOT retryable — fix the body first. ## Rate limits (default; per API key) - Generate media: 30 req/min - Generate music: 10 req/min - Generate dialogue: 10 req/min - Generate ad-copy: 5 req/min - Upload media + add asset: 30 req/min - All read endpoints (status polls, /v1/models, /v1/balance, /v1/generations, list/rename/delete asset): 120 req/min 429 responses include `Retry-After` and a `retry_after` field in the body. ## Tips for AI agents 1. Always read `/docs/models.json` first. The model catalog changes — don't memorize fields from this file; treat it as the source of truth. 2. For one-off automation, fetch `/docs/models/{model_id}.json` instead of the full catalog — same data, ~99% smaller payload. 3. Status polling: media + dialogue both poll `/v1/generate/media/{id}` (only music + separation have their own status routes). Don't poll faster than every 5s for the first 30s, then every 15-30s. 4. Prefer URL ingest (`POST /v1/assets` / passing public URLs in `source_media_urls`) over multipart uploads when the file is already on the public web. 5. Cross-origin: every public route sends permissive CORS headers; you can call from a browser. 6. The interactive `/docs` page is a SPA — its content is JS-rendered and not visible to a plain `curl`. Use `/docs/models.json` and `/openapi.json` for machine ingestion; use this `/llms.txt` for narrative guidance. ## Related - [MCP server](https://mcp.kubeez.com/docs): same models exposed as MCP tools for Claude Desktop, Cursor, and any MCP-compatible client. Same auth scopes. - [Kubeez web app](https://kubeez.com): human UI for the same generations, plus billing and key management.