API REFERENCE

SACTL Gateway API

Every endpoint on this page is registered in gateway-sidecar and verified by 10/10 smoke tests. In the curl examples, replace YOUR_VK with your sk-xa-* key and the calls work directly.

Base URL https://api.sactl.ai

Authentication

Every protected endpoint requires Authorization: Bearer sk-xa-{env}-{base62}. Keys are generated in the dashboard or sent directly by Telegram support; the database stores an HMAC digest (never plaintext). Each key carries four properties — allowed-model list, IP allowlist, monthly budget cap, owning tenant — checked one by one before each call. Mismatches are rejected on the spot.

Common request headers

FieldTypeDescription
Authorization string required Bearer sk-xa-{env}-{base62}. env is one of dev/stg/prod.
Content-Type string required POST must be application/json; /v1/files uploads use multipart/form-data.
X-Request-Id string Client-supplied trace id. If absent, SACTL generates a UUIDv7 and writes it back in the response header.
anthropic-beta string Applies only to /v1/messages / Files / Batches; passed verbatim to Anthropic.

Response headers

Every 2xx response carries SACTL accounting headers. Front-ends can read X-SACTL-Usage-Cost-USD directly to build a client-side ledger — no extra /v1/usage call required.

HeaderTypeWhenDescription
X-SACTL-Usage-Prompt-Tokens int 2xx Prompt tokens for this request (as reported by upstream).
X-SACTL-Usage-Completion-Tokens int 2xx Completion tokens for this request.
X-SACTL-Usage-Cost-USD decimal 2xx USD cost for this request, settled against SACTL's discount table (Claude × 0.30; GPT as low as 1/30 of official supported; Gemini coming soon), to 6 decimal places.
X-SACTL-Budget-Remaining-USD decimal when capped Present when the VK has a monthly cap; reports remaining budget for the month.
Retry-After int 429 Returned when GCRA rate-limiting fires. In seconds, computed from bucket recovery rate.

Error envelope

All 4xx/5xx responses use the envelope below. We never proxy the upstream raw error body — upstream credentials, internal URLs, and trace details never leak to clients. Use trace_id for support follow-up.

{
  "error": {
    "code": "key_invalid",
    "message": "virtual key is invalid or revoked",
    "trace_id": "01JX7W3Z5A8M9E2Q1P4K6R8TVB"
  }
}

Registered error codes

HTTPcodeTrigger
401key_invalidVK missing, malformed, or revoked.
403model_forbiddenThe requested model is not on this VK's allowed_models list.
403ip_forbiddenRequest source IP is not on this VK's IP allowlist.
403signature_invalidVK HMAC verification failed (e.g. pepper mismatch).
402budget_exhaustedPre-debit determined this call would exceed the VK's monthly USD cap.
402context_too_longPrompt tokens exceed the model's context window registered in the pricing registry.
429rate_limitedGCRA rate-limit fired (any of tenant / vk / vk×model / vk×ip).
400ssrf_forbiddenMultimodal image_url points at private / internal address; blocked by the SSRF filter.
400model_unknownModel ID not found in the pricing registry.
400bad_requestMalformed JSON or missing required fields.
413payload_too_largeRequest body exceeds SIDECAR_BODY_MAX_MB (default 32MB).
415unsupported_mediaMultipart upload MIME not on the allowlist.
502upstream_errorAnthropic or OpenAI returned a 5xx. (Gemini upstream coming soon.)
504upstream_timeoutUpstream request timed out.
500internal_errorGateway-internal exception.
503service_unavailableAll keys in the pool are in cooldown (circuit breaker tripped). preview

Inference

Core inference endpoints — Anthropic native plus OpenAI-compatible (so OpenAI SDKs reach Claude or GPT with zero client code changes). Claude + GPT upstream live; Gemini coming soon.

POST /v1/messages stable

Create message (Anthropic native)

Anthropic native Messages endpoint. Request and response bodies are compatible with the official Anthropic Messages API — Claude Code talks to this endpoint. We only add auth, usage caps, rate limiting, and billing logs in front; the request body semantics are unchanged.

Headers

FieldTypeDescription
AuthorizationstringrequiredBearer sk-xa-...
Content-Typestringrequiredapplication/json
anthropic-betastringComma-separated list of beta flags. Passed verbatim to Anthropic.
X-Request-IdstringClient-side trace id.

Request body

FieldTypeDescription
modelstringrequiredModel ID. Must be on the VK's allowed_models list — see GET /v1/models.
max_tokensintrequiredCap is the model's token window per the pricing registry.
messagesarrayrequiredAnthropic-style message array. Role is user or assistant; content is a string or a block array (text / image / tool_use / tool_result).
systemstring | arraySystem prompt. Array form supports cache_control.
temperaturenumber0.0 - 1.0。
top_pnumberNucleus sampling.
top_kintTop-k sampling.
stop_sequencesstring[]Custom stop sequences.
toolsarrayAnthropic native tools format.
tool_choiceobject{type: "auto"|"any"|"tool", name?: "..."}
streamboolDefault false. When true, response is SSE (text/event-stream) with Anthropic native event names (message_start / content_block_delta / …).
thinkingobject{type: "enabled", budget_tokens: 32000} for extended thinking. SACTL auto-injects anthropic-beta if you didn't set it.
cache_controlobjectEmbedded in content block / tools / system. SACTL can auto-inject {type: "ephemeral"} on system and tools blocks (controlled by the VK's auto_prompt_cache flag).
metadataobject{user_id: "..."} passed verbatim to upstream and logged to audit.

Request example

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "messages": [
    { "role": "user", "content": "Hello, Claude." }
  ]
}

Response

HTTP/1.1 200 OK
X-SACTL-Usage-Prompt-Tokens: 12
X-SACTL-Usage-Completion-Tokens: 28
X-SACTL-Usage-Cost-USD: 0.000336
X-SACTL-Budget-Remaining-USD: 49.832104

{
  "id": "msg_01AbCdEfGhIjKlMnOpQrStUv",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-6",
  "content": [
    { "type": "text", "text": "Hello! How can I help you today?" }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 28
  }
}

curl

curl https://api.sactl.ai/v1/messages \
  -H "Authorization: Bearer YOUR_VK" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "Hello, Claude." }
    ]
  }'

Errors

HTTPcodeScenario
401key_invalidVK missing or revoked.
403model_forbiddenModel not on the allowlist.
402budget_exhaustedPre-debit exceeds the cap.
402context_too_longPrompt exceeds the token window.
429rate_limitedGCRA fired; carries Retry-After.
502upstream_errorAnthropic returned 5xx.
504upstream_timeoutUpstream timed out.
POST /v1/chat/completions stable

Create chat completion (OpenAI-compat)

OpenAI Chat Completions-compatible endpoint. Clients using openai-python / openai-node / LangChain or any OpenAI SDK can keep their code unchanged — just flip the base URL. SACTL translates OpenAI shape to Anthropic shape upstream, then translates the Anthropic response back to OpenAI shape. Bidirectional support for tool_calls, image_url, stream, and reasoning_effort (maps to Anthropic thinking).

Translation switch: set SIDECAR_TRANSLATE_OPENAI_MODE=openai-to-anthropic (default on). Set to passthrough to forward OpenAI-shape requests directly to the OpenAI upstream.

Request body (key fields)

FieldTypeDescription
modelstringrequiredAccepts OpenAI model IDs and Claude model IDs (in translation mode).
messagesarrayrequiredOpenAI message array. role is system/user/assistant/tool.
max_tokensintAliased to max_completion_tokens.
temperaturenumber0.0 – 2.0 (OpenAI semantics).
top_pnumber
toolsarrayOpenAI tools format; translated to Anthropic tools.
tool_choicestring | object"auto" / "none" / "required" / {type:"function", function:{name}}
streamboolOpenAI SSE format (data: {"choices":[...]}); SACTL rewrites Anthropic events into OpenAI delta events.
reasoning_effortstring"low" | "medium" | "high" → Anthropic thinking.budget_tokens: 2048 | 8000 | 32000
response_formatobjectOnly {type: "json_object"} is honored; on Claude this is mapped to a system-prompt injection.

Translation cheat sheet

OpenAIAnthropicNotes
messages[].role="system"systemMerged into the top-level system, preserving order.
toolstoolsField names match; parameter schema copied as-is.
tool_callstool_useExtracted from the response's block array.
role="tool"tool_result blockMerged with the prior assistant message into the user turn.
image_urlimage blockURL passes through SACTL's SSRF filter; data URLs convert directly to base64.
finish_reasonstop_reasonstop/length/tool_calls/content_filter mapped pairwise.
reasoning_effortthinking.budget_tokenslow=2048, medium=8000, high=32000。

Request example

{
  "model": "claude-sonnet-4-6",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Write a haiku about Go channels." }
  ],
  "max_tokens": 256,
  "temperature": 0.7
}

Response

{
  "id": "chatcmpl-0A1B2C3D4E5F",
  "object": "chat.completion",
  "created": 1745145600,
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Silent channels hum,\nGoroutines pass gifts in dark,\nSelect waits for dawn."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 22,
    "completion_tokens": 31,
    "total_tokens": 53
  }
}

curl

curl https://api.sactl.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_VK" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Write a haiku about Go channels." }
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Errors

HTTPcodeScenario
401key_invalidVK authentication failed.
403model_forbiddenModel not on the VK allowlist.
400ssrf_forbiddenimage_url points at a private network.
400model_unknownModel ID not in the pricing registry.
429rate_limitedRate-limit fired.
502upstream_errorUpstream 5xx.
POST /v1/completions coming soon legacy

Create completion (legacy) — available once GPT upstream is wired

Legacy OpenAI completions (non-chat) pass-through endpoint. The route is registered in the gateway, but at this stage SACTL only fronts the Claude upstream — Claude doesn't have non-chat completions, so this endpoint is not exposed today and calls return 503 service_unavailable. It will auto-enable when GPT upstream is wired up. For new integrations, use /v1/chat/completions directly.

Request body

FieldTypeDescription
modelstringrequiredOpenAI completions model ID.
promptstring | arrayrequiredString or array of strings.
max_tokensint
temperaturenumber
streamboolOpenAI native SSE.

curl

curl https://api.sactl.ai/v1/completions \
  -H "Authorization: Bearer YOUR_VK" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo-instruct",
    "prompt": "Say hello in three languages.",
    "max_tokens": 64
  }'
POST /v1/embeddings stable

Create embeddings

Embeddings (OpenAI-compatible schema). Anthropic does not offer embeddings, so this endpoint routes to GPT (text-embedding-3-small / text-embedding-3-large) upstream. Gemini (text-embedding-004) upstream coming soon. Per-token billing, with the same usage caps and rate limiting as inference endpoints.

Request body

FieldTypeDescription
modelstringrequiredEmbedding model ID.
inputstring | arrayrequiredSingle string or array of strings.
encoding_formatstring"float" (default) / "base64".
dimensionsintOnly valid on text-embedding-3-*; truncates embedding dimensions.

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0412, /* ... 1536 floats */]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 7,
    "total_tokens": 7
  }
}

curl

curl https://api.sactl.ai/v1/embeddings \
  -H "Authorization: Bearer YOUR_VK" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

Errors

HTTPcodeScenario
401key_invalidVK invalid.
400model_unknownModel ID not registered.
413payload_too_largeInput array too large.
429rate_limitedRate-limited.
GET /v1/models stable

List models

Lists every model the VK is authorized to call. Returns Anthropic native shape {data: [{id, type, ...}]}. The list is computed dynamically from the intersection of the VK's allowed_models array and the pricing registry — even if a model is in the pricing registry, it won't appear here unless the VK has it enabled.

Response

{
  "data": [
    { "type": "model", "id": "claude-opus-4-7",     "display_name": "Claude Opus 4.7",   "created_at": "2025-09-29T00:00:00Z" },
    { "type": "model", "id": "claude-opus-4-6",     "display_name": "Claude Opus 4.6",   "created_at": "2025-08-05T00:00:00Z" },
    { "type": "model", "id": "claude-sonnet-4-6",   "display_name": "Claude Sonnet 4.6", "created_at": "2025-07-10T00:00:00Z" },
    { "type": "model", "id": "claude-haiku-4-5",    "display_name": "Claude Haiku 4.5",  "created_at": "2025-05-03T00:00:00Z" }
  ],
  "has_more": false,
  "first_id": "claude-opus-4-7",
  "last_id": "claude-haiku-4-5"
}
# GPT 模型已加入此列表(与 VK allowed_models 取交集)。Gemini 上游接入后自动出现。

curl

curl https://api.sactl.ai/v1/models \
  -H "Authorization: Bearer YOUR_VK"

Files API

Anthropic Files API pass-through. Upload documents / images and reference them in subsequent /v1/messages calls. Max file size is governed by SIDECAR_FILES_MAX_MB (default 32MB); the MIME allowlist is configurable. Every Files operation writes a billing log entry (file.uploaded / file.deleted) for reconciliation.

POST /v1/files stable

Upload file

Upload a file via multipart/form-data. The returned file id can be referenced in /v1/messages content blocks as {type: "document", source: {type: "file", file_id: "..."}}.

Form fields

FieldTypeDescription
filefilerequiredFile binary. MIME allowlist: application/pdf, image/png, image/jpeg, image/gif, image/webp, text/plain, text/csv, text/markdown.
purposestringOptional tag, written to audit log.

Response

{
  "id": "file_01AbCdEfGhIjKlMnOpQrStUv",
  "type": "file",
  "filename": "contract.pdf",
  "mime_type": "application/pdf",
  "size_bytes": 183204,
  "created_at": "2026-04-20T08:12:31.441Z",
  "downloadable": false
}

curl

curl https://api.sactl.ai/v1/files \
  -H "Authorization: Bearer YOUR_VK" \
  -F "[email protected];type=application/pdf" \
  -F "purpose=context"

Errors

HTTPcodeScenario
401key_invalidVK invalid.
413payload_too_largeFile exceeds SIDECAR_FILES_MAX_MB.
415unsupported_mediaMIME not on the allowlist.
GET /v1/files stable

List files

List files uploaded by the current VK (files from other VKs are invisible). Cursor-based pagination.

Query params

FieldTypeDescription
limitint1 – 1000, default 20.
before_idstringCursor; returns files before this id.
after_idstringCursor; returns files after this id.

Response

{
  "data": [
    { "id": "file_01AbCd...", "filename": "contract.pdf", "size_bytes": 183204, "mime_type": "application/pdf", "created_at": "2026-04-20T08:12:31.441Z" }
  ],
  "has_more": false,
  "first_id": "file_01AbCd...",
  "last_id": "file_01AbCd..."
}

curl

curl "https://api.sactl.ai/v1/files?limit=20" \
  -H "Authorization: Bearer YOUR_VK"
GET /v1/files/{id} stable

Get file metadata

Read metadata for a single file. SACTL does not store the file binary itself; this endpoint returns Anthropic upstream metadata pass-through.

curl

curl https://api.sactl.ai/v1/files/file_01AbCdEfGhIjKlMnOpQrStUv \
  -H "Authorization: Bearer YOUR_VK"

Errors

HTTPcodeScenario
401key_invalidVK invalid.
404bad_requestfile id does not exist or does not belong to this VK.
DELETE /v1/files/{id} stable

Delete file

Delete a file. After deletion, /v1/messages calls referencing the file_id return bad_request. The audit log emits file.deleted.

Response

{
  "id": "file_01AbCdEfGhIjKlMnOpQrStUv",
  "type": "file_deleted"
}

curl

curl -X DELETE https://api.sactl.ai/v1/files/file_01AbCdEfGhIjKlMnOpQrStUv \
  -H "Authorization: Bearer YOUR_VK"

Batch API

Submit large numbers of Messages requests as a single batch. State machine: validating → in_progress → ended | canceled | failed. Results are returned as JSONL and are downloadable only in the ended state.

Billing: batch calls take Anthropic's 50% batch discount stacked with our × 0.30 — Claude batch price = official × 0.15 (see the pricing page). Billing settlement happens only at batch completion or cancellation — canceled / failed requests are not billed.

POST /v1/messages/batches stable

Create message batch

Submit a batch of Messages requests. Each request body is equivalent to a single /v1/messages call, wrapped with a custom_id for reconciliation.

Request body

FieldTypeDescription
requestsarrayrequired1 – 10,000 {custom_id, params} items.
requests[].custom_idstringrequiredUnique within the batch; used to map results back to your business id.
requests[].paramsobjectrequiredSame shape as the /v1/messages request body (model / max_tokens / messages / …).

Request example

{
  "requests": [
    {
      "custom_id": "job-001",
      "params": {
        "model": "claude-haiku-4-5",
        "max_tokens": 256,
        "messages": [ { "role": "user", "content": "Summarize Go channels." } ]
      }
    },
    {
      "custom_id": "job-002",
      "params": {
        "model": "claude-haiku-4-5",
        "max_tokens": 256,
        "messages": [ { "role": "user", "content": "What is GCRA?" } ]
      }
    }
  ]
}

Response

{
  "id": "msgbatch_01Wx9...",
  "type": "message_batch",
  "processing_status": "in_progress",
  "request_counts": { "processing": 2, "succeeded": 0, "errored": 0, "canceled": 0, "expired": 0 },
  "ended_at": null,
  "created_at": "2026-04-20T08:12:31.441Z",
  "expires_at": "2026-04-21T08:12:31.441Z",
  "cancel_initiated_at": null,
  "results_url": null
}

curl

curl https://api.sactl.ai/v1/messages/batches \
  -H "Authorization: Bearer YOUR_VK" \
  -H "Content-Type: application/json" \
  -d @batch.json
GET /v1/messages/batches stable

List message batches

List all batches for the current VK. Supports cursor pagination via limit / before_id / after_id.

curl

curl "https://api.sactl.ai/v1/messages/batches?limit=20" \
  -H "Authorization: Bearer YOUR_VK"
GET /v1/messages/batches/{id} stable

Retrieve message batch

Read the status of a single batch. processing_status transitions through validatingin_progressended / canceled / failed.

Response

{
  "id": "msgbatch_01Wx9...",
  "processing_status": "ended",
  "request_counts": { "processing": 0, "succeeded": 2, "errored": 0, "canceled": 0, "expired": 0 },
  "ended_at": "2026-04-20T08:14:02.112Z",
  "results_url": "https://api.sactl.ai/v1/messages/batches/msgbatch_01Wx9.../results"
}

curl

curl https://api.sactl.ai/v1/messages/batches/msgbatch_01Wx9... \
  -H "Authorization: Bearer YOUR_VK"
GET /v1/messages/batches/{id}/results stable

Retrieve message batch results

Returns per-request results as JSONL (application/x-ndjson). Only available when processing_status=ended; other states return bad_request.

Response (one line per request)

{"custom_id":"job-001","result":{"type":"succeeded","message":{"id":"msg_...","content":[...],"usage":{...}}}}
{"custom_id":"job-002","result":{"type":"succeeded","message":{"id":"msg_...","content":[...],"usage":{...}}}}

curl

curl https://api.sactl.ai/v1/messages/batches/msgbatch_01Wx9.../results \
  -H "Authorization: Bearer YOUR_VK"
POST /v1/messages/batches/{id}/cancel stable

Cancel message batch

Move an in_progress / validating batch to canceling. In-flight requests may still complete and get billed normally; canceled requests are not billed.

curl

curl -X POST https://api.sactl.ai/v1/messages/batches/msgbatch_01Wx9.../cancel \
  -H "Authorization: Bearer YOUR_VK"

System

Health check and metrics. No VK required — but in production these should be restricted to ops-network CIDRs at the ingress layer.

GET /health stable

Health check

Health check. Returns 200 + JSON; fields report connectivity to each dependency. Suitable as a Kubernetes liveness / readiness probe target. No auth required.

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
  "status": "ok",
  "redis": "ok",
  "vault": "ok",
  "version": "0.9.3",
  "commit": "a1b2c3d",
  "uptime_seconds": 84221
}

curl

curl https://api.sactl.ai/health
GET /metrics stable

Prometheus metrics

Prometheus metrics endpoint. Returns all business metrics in text/plain Prometheus format: rl_* (rate limiting), forward_* (forwarding), upstream_* (upstream latency / errors), breaker_* (circuit breakers), audit_* (audit pipeline). Add it to your Prometheus scrape config.

Production guidance: this endpoint should be restricted to the ops network / Prometheus scraper in production (enforce a CIDR allowlist at upstream ingress / nginx). Metrics themselves carry no credentials but reveal traffic patterns.

Response (excerpt)

# HELP rl_reject_total Number of requests rejected by rate limiter
# TYPE rl_reject_total counter
rl_reject_total{dim="tenant"} 12
rl_reject_total{dim="vk"} 3
rl_reject_total{dim="vk_model"} 1
rl_reject_total{dim="vk_ip"} 0

# HELP forward_request_duration_seconds Request duration histogram
# TYPE forward_request_duration_seconds histogram
forward_request_duration_seconds_bucket{route="/v1/messages",le="0.5"} 4821
forward_request_duration_seconds_bucket{route="/v1/messages",le="1"} 5910
forward_request_duration_seconds_bucket{route="/v1/messages",le="+Inf"} 6024

# HELP upstream_5xx_total Upstream 5xx responses
# TYPE upstream_5xx_total counter
upstream_5xx_total{upstream="anthropic"} 2

# HELP audit_worm_append_total WORM audit log appends
# TYPE audit_worm_append_total counter
audit_worm_append_total{event="message.ok"} 5932
audit_worm_append_total{event="file.uploaded"} 18

curl

curl https://api.sactl.ai/metrics

Preview / Coming soon

The capabilities below are code-complete but not wired to the hot path, or are still in canary. Production customers should not rely on their behavior.

Multi-key pool PickMiddleware preview

Multiple API keys hang under a single upstream provider; key selection is based on health / cost / usage. When a key gets a 429 / 401 from upstream it enters a cooldown window. When all keys cool down, the provider itself trips and returns 503 service_unavailable.

  • Current status: middleware code is implemented but not on the forward path by default.
  • Expected GA: 2026 Q2. Wiring it in does not change the API contract — it just turns 503 service_unavailable from "theoretically possible" into actually observed.

Markup multiplier settle preview

For multi-tier reseller scenarios, VKs can carry a markup_multiplier (e.g. 1.20 = add a 20% channel margin); at settlement the margin is routed to the channel account automatically.

  • Current status: the schema field is in the database; the settlement step does not yet apply the multiplier, so actual billing equals the underlying price.
  • Expected GA: same batch as Multi-key pool, 2026 Q2.
  • Note: this does not expose any new endpoint to the client; it only changes what the X-SACTL-Usage-Cost-USD header amount means.