Authentication
Every protected endpoint requires Authorization: Bearer sk-xa-{env}-{base62}. Keys are generated in the dashboard or sent directly by Telegram support; the database stores an HMAC digest (never plaintext). Each key carries four properties — allowed-model list, IP allowlist, monthly budget cap, owning tenant — checked one by one before each call. Mismatches are rejected on the spot.
Common request headers
| Field | Type | Description | |
|---|---|---|---|
| Authorization | string | required | Bearer sk-xa-{env}-{base62}. env is one of dev/stg/prod. |
| Content-Type | string | required | POST must be application/json; /v1/files uploads use multipart/form-data. |
| X-Request-Id | string | Client-supplied trace id. If absent, SACTL generates a UUIDv7 and writes it back in the response header. | |
| anthropic-beta | string | Applies only to /v1/messages / Files / Batches; passed verbatim to Anthropic. |
Response headers
Every 2xx response carries SACTL accounting headers. Front-ends can read X-SACTL-Usage-Cost-USD directly to build a client-side ledger — no extra /v1/usage call required.
| Header | Type | When | Description |
|---|---|---|---|
| X-SACTL-Usage-Prompt-Tokens | int | 2xx | Prompt tokens for this request (as reported by upstream). |
| X-SACTL-Usage-Completion-Tokens | int | 2xx | Completion tokens for this request. |
| X-SACTL-Usage-Cost-USD | decimal | 2xx | USD cost for this request, settled against SACTL's discount table (Claude × 0.30; GPT as low as 1/30 of official supported; Gemini coming soon), to 6 decimal places. |
| X-SACTL-Budget-Remaining-USD | decimal | when capped | Present when the VK has a monthly cap; reports remaining budget for the month. |
| Retry-After | int | 429 | Returned when GCRA rate-limiting fires. In seconds, computed from bucket recovery rate. |
Error envelope
All 4xx/5xx responses use the envelope below. We never proxy the upstream raw error body — upstream credentials, internal URLs, and trace details never leak to clients. Use trace_id for support follow-up.
{
"error": {
"code": "key_invalid",
"message": "virtual key is invalid or revoked",
"trace_id": "01JX7W3Z5A8M9E2Q1P4K6R8TVB"
}
}Registered error codes
| HTTP | code | Trigger |
|---|---|---|
| 401 | key_invalid | VK missing, malformed, or revoked. |
| 403 | model_forbidden | The requested model is not on this VK's allowed_models list. |
| 403 | ip_forbidden | Request source IP is not on this VK's IP allowlist. |
| 403 | signature_invalid | VK HMAC verification failed (e.g. pepper mismatch). |
| 402 | budget_exhausted | Pre-debit determined this call would exceed the VK's monthly USD cap. |
| 402 | context_too_long | Prompt tokens exceed the model's context window registered in the pricing registry. |
| 429 | rate_limited | GCRA rate-limit fired (any of tenant / vk / vk×model / vk×ip). |
| 400 | ssrf_forbidden | Multimodal image_url points at private / internal address; blocked by the SSRF filter. |
| 400 | model_unknown | Model ID not found in the pricing registry. |
| 400 | bad_request | Malformed JSON or missing required fields. |
| 413 | payload_too_large | Request body exceeds SIDECAR_BODY_MAX_MB (default 32MB). |
| 415 | unsupported_media | Multipart upload MIME not on the allowlist. |
| 502 | upstream_error | Anthropic or OpenAI returned a 5xx. (Gemini upstream coming soon.) |
| 504 | upstream_timeout | Upstream request timed out. |
| 500 | internal_error | Gateway-internal exception. |
| 503 | service_unavailable | All keys in the pool are in cooldown (circuit breaker tripped). preview |
Inference
Core inference endpoints — Anthropic native plus OpenAI-compatible (so OpenAI SDKs reach Claude or GPT with zero client code changes). Claude + GPT upstream live; Gemini coming soon.
Create message (Anthropic native)
Anthropic native Messages endpoint. Request and response bodies are compatible with the official Anthropic Messages API — Claude Code talks to this endpoint. We only add auth, usage caps, rate limiting, and billing logs in front; the request body semantics are unchanged.
Headers
| Field | Type | Description | |
|---|---|---|---|
| Authorization | string | required | Bearer sk-xa-... |
| Content-Type | string | required | application/json |
| anthropic-beta | string | Comma-separated list of beta flags. Passed verbatim to Anthropic. | |
| X-Request-Id | string | Client-side trace id. |
Request body
| Field | Type | Description | |
|---|---|---|---|
| model | string | required | Model ID. Must be on the VK's allowed_models list — see GET /v1/models. |
| max_tokens | int | required | Cap is the model's token window per the pricing registry. |
| messages | array | required | Anthropic-style message array. Role is user or assistant; content is a string or a block array (text / image / tool_use / tool_result). |
| system | string | array | System prompt. Array form supports cache_control. | |
| temperature | number | 0.0 - 1.0。 | |
| top_p | number | Nucleus sampling. | |
| top_k | int | Top-k sampling. | |
| stop_sequences | string[] | Custom stop sequences. | |
| tools | array | Anthropic native tools format. | |
| tool_choice | object | {type: "auto"|"any"|"tool", name?: "..."} | |
| stream | bool | Default false. When true, response is SSE (text/event-stream) with Anthropic native event names (message_start / content_block_delta / …). | |
| thinking | object | {type: "enabled", budget_tokens: 32000} for extended thinking. SACTL auto-injects anthropic-beta if you didn't set it. | |
| cache_control | object | Embedded in content block / tools / system. SACTL can auto-inject {type: "ephemeral"} on system and tools blocks (controlled by the VK's auto_prompt_cache flag). | |
| metadata | object | {user_id: "..."} passed verbatim to upstream and logged to audit. |
Request example
{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"messages": [
{ "role": "user", "content": "Hello, Claude." }
]
}Response
HTTP/1.1 200 OK
X-SACTL-Usage-Prompt-Tokens: 12
X-SACTL-Usage-Completion-Tokens: 28
X-SACTL-Usage-Cost-USD: 0.000336
X-SACTL-Budget-Remaining-USD: 49.832104
{
"id": "msg_01AbCdEfGhIjKlMnOpQrStUv",
"type": "message",
"role": "assistant",
"model": "claude-sonnet-4-6",
"content": [
{ "type": "text", "text": "Hello! How can I help you today?" }
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 28
}
}curl
curl https://api.sactl.ai/v1/messages \ -H "Authorization: Bearer YOUR_VK" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [ { "role": "user", "content": "Hello, Claude." } ] }'
Errors
| HTTP | code | Scenario |
|---|---|---|
| 401 | key_invalid | VK missing or revoked. |
| 403 | model_forbidden | Model not on the allowlist. |
| 402 | budget_exhausted | Pre-debit exceeds the cap. |
| 402 | context_too_long | Prompt exceeds the token window. |
| 429 | rate_limited | GCRA fired; carries Retry-After. |
| 502 | upstream_error | Anthropic returned 5xx. |
| 504 | upstream_timeout | Upstream timed out. |
Create chat completion (OpenAI-compat)
OpenAI Chat Completions-compatible endpoint. Clients using openai-python / openai-node / LangChain or any OpenAI SDK can keep their code unchanged — just flip the base URL. SACTL translates OpenAI shape to Anthropic shape upstream, then translates the Anthropic response back to OpenAI shape. Bidirectional support for tool_calls, image_url, stream, and reasoning_effort (maps to Anthropic thinking).
SIDECAR_TRANSLATE_OPENAI_MODE=openai-to-anthropic (default on). Set to passthrough to forward OpenAI-shape requests directly to the OpenAI upstream.
Request body (key fields)
| Field | Type | Description | |
|---|---|---|---|
| model | string | required | Accepts OpenAI model IDs and Claude model IDs (in translation mode). |
| messages | array | required | OpenAI message array. role is system/user/assistant/tool. |
| max_tokens | int | Aliased to max_completion_tokens. | |
| temperature | number | 0.0 – 2.0 (OpenAI semantics). | |
| top_p | number | ||
| tools | array | OpenAI tools format; translated to Anthropic tools. | |
| tool_choice | string | object | "auto" / "none" / "required" / {type:"function", function:{name}}。 | |
| stream | bool | OpenAI SSE format (data: {"choices":[...]}); SACTL rewrites Anthropic events into OpenAI delta events. | |
| reasoning_effort | string | "low" | "medium" | "high" → Anthropic thinking.budget_tokens: 2048 | 8000 | 32000 | |
| response_format | object | Only {type: "json_object"} is honored; on Claude this is mapped to a system-prompt injection. |
Translation cheat sheet
| OpenAI | Anthropic | Notes |
|---|---|---|
| messages[].role="system" | system | Merged into the top-level system, preserving order. |
| tools | tools | Field names match; parameter schema copied as-is. |
| tool_calls | tool_use | Extracted from the response's block array. |
| role="tool" | tool_result block | Merged with the prior assistant message into the user turn. |
| image_url | image block | URL passes through SACTL's SSRF filter; data URLs convert directly to base64. |
| finish_reason | stop_reason | stop/length/tool_calls/content_filter mapped pairwise. |
| reasoning_effort | thinking.budget_tokens | low=2048, medium=8000, high=32000。 |
Request example
{
"model": "claude-sonnet-4-6",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Write a haiku about Go channels." }
],
"max_tokens": 256,
"temperature": 0.7
}Response
{
"id": "chatcmpl-0A1B2C3D4E5F",
"object": "chat.completion",
"created": 1745145600,
"model": "claude-sonnet-4-6",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Silent channels hum,\nGoroutines pass gifts in dark,\nSelect waits for dawn."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 22,
"completion_tokens": 31,
"total_tokens": 53
}
}curl
curl https://api.sactl.ai/v1/chat/completions \ -H "Authorization: Bearer YOUR_VK" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Write a haiku about Go channels." } ], "max_tokens": 256, "temperature": 0.7 }'
Errors
| HTTP | code | Scenario |
|---|---|---|
| 401 | key_invalid | VK authentication failed. |
| 403 | model_forbidden | Model not on the VK allowlist. |
| 400 | ssrf_forbidden | image_url points at a private network. |
| 400 | model_unknown | Model ID not in the pricing registry. |
| 429 | rate_limited | Rate-limit fired. |
| 502 | upstream_error | Upstream 5xx. |
Create completion (legacy) — available once GPT upstream is wired
Legacy OpenAI completions (non-chat) pass-through endpoint. The route is registered in the gateway, but at this stage SACTL only fronts the Claude upstream — Claude doesn't have non-chat completions, so this endpoint is not exposed today and calls return 503 service_unavailable. It will auto-enable when GPT upstream is wired up. For new integrations, use /v1/chat/completions directly.
Request body
| Field | Type | Description | |
|---|---|---|---|
| model | string | required | OpenAI completions model ID. |
| prompt | string | array | required | String or array of strings. |
| max_tokens | int | ||
| temperature | number | ||
| stream | bool | OpenAI native SSE. |
curl
curl https://api.sactl.ai/v1/completions \ -H "Authorization: Bearer YOUR_VK" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-3.5-turbo-instruct", "prompt": "Say hello in three languages.", "max_tokens": 64 }'
Create embeddings
Embeddings (OpenAI-compatible schema). Anthropic does not offer embeddings, so this endpoint routes to GPT (text-embedding-3-small / text-embedding-3-large) upstream. Gemini (text-embedding-004) upstream coming soon. Per-token billing, with the same usage caps and rate limiting as inference endpoints.
Request body
| Field | Type | Description | |
|---|---|---|---|
| model | string | required | Embedding model ID. |
| input | string | array | required | Single string or array of strings. |
| encoding_format | string | "float" (default) / "base64". | |
| dimensions | int | Only valid on text-embedding-3-*; truncates embedding dimensions. |
Response
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0091, 0.0412, /* ... 1536 floats */]
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 7,
"total_tokens": 7
}
}curl
curl https://api.sactl.ai/v1/embeddings \ -H "Authorization: Bearer YOUR_VK" \ -H "Content-Type: application/json" \ -d '{ "model": "text-embedding-3-small", "input": "The quick brown fox jumps over the lazy dog." }'
Errors
| HTTP | code | Scenario |
|---|---|---|
| 401 | key_invalid | VK invalid. |
| 400 | model_unknown | Model ID not registered. |
| 413 | payload_too_large | Input array too large. |
| 429 | rate_limited | Rate-limited. |
List models
Lists every model the VK is authorized to call. Returns Anthropic native shape {data: [{id, type, ...}]}. The list is computed dynamically from the intersection of the VK's allowed_models array and the pricing registry — even if a model is in the pricing registry, it won't appear here unless the VK has it enabled.
Response
{
"data": [
{ "type": "model", "id": "claude-opus-4-7", "display_name": "Claude Opus 4.7", "created_at": "2025-09-29T00:00:00Z" },
{ "type": "model", "id": "claude-opus-4-6", "display_name": "Claude Opus 4.6", "created_at": "2025-08-05T00:00:00Z" },
{ "type": "model", "id": "claude-sonnet-4-6", "display_name": "Claude Sonnet 4.6", "created_at": "2025-07-10T00:00:00Z" },
{ "type": "model", "id": "claude-haiku-4-5", "display_name": "Claude Haiku 4.5", "created_at": "2025-05-03T00:00:00Z" }
],
"has_more": false,
"first_id": "claude-opus-4-7",
"last_id": "claude-haiku-4-5"
}
# GPT 模型已加入此列表(与 VK allowed_models 取交集)。Gemini 上游接入后自动出现。curl
curl https://api.sactl.ai/v1/models \
-H "Authorization: Bearer YOUR_VK"Files API
Anthropic Files API pass-through. Upload documents / images and reference them in subsequent /v1/messages calls. Max file size is governed by SIDECAR_FILES_MAX_MB (default 32MB); the MIME allowlist is configurable. Every Files operation writes a billing log entry (file.uploaded / file.deleted) for reconciliation.
Upload file
Upload a file via multipart/form-data. The returned file id can be referenced in /v1/messages content blocks as {type: "document", source: {type: "file", file_id: "..."}}.
Form fields
| Field | Type | Description | |
|---|---|---|---|
| file | file | required | File binary. MIME allowlist: application/pdf, image/png, image/jpeg, image/gif, image/webp, text/plain, text/csv, text/markdown. |
| purpose | string | Optional tag, written to audit log. |
Response
{
"id": "file_01AbCdEfGhIjKlMnOpQrStUv",
"type": "file",
"filename": "contract.pdf",
"mime_type": "application/pdf",
"size_bytes": 183204,
"created_at": "2026-04-20T08:12:31.441Z",
"downloadable": false
}curl
curl https://api.sactl.ai/v1/files \ -H "Authorization: Bearer YOUR_VK" \ -F "[email protected];type=application/pdf" \ -F "purpose=context"
Errors
| HTTP | code | Scenario |
|---|---|---|
| 401 | key_invalid | VK invalid. |
| 413 | payload_too_large | File exceeds SIDECAR_FILES_MAX_MB. |
| 415 | unsupported_media | MIME not on the allowlist. |
List files
List files uploaded by the current VK (files from other VKs are invisible). Cursor-based pagination.
Query params
| Field | Type | Description | |
|---|---|---|---|
| limit | int | 1 – 1000, default 20. | |
| before_id | string | Cursor; returns files before this id. | |
| after_id | string | Cursor; returns files after this id. |
Response
{
"data": [
{ "id": "file_01AbCd...", "filename": "contract.pdf", "size_bytes": 183204, "mime_type": "application/pdf", "created_at": "2026-04-20T08:12:31.441Z" }
],
"has_more": false,
"first_id": "file_01AbCd...",
"last_id": "file_01AbCd..."
}curl
curl "https://api.sactl.ai/v1/files?limit=20" \ -H "Authorization: Bearer YOUR_VK"
Get file metadata
Read metadata for a single file. SACTL does not store the file binary itself; this endpoint returns Anthropic upstream metadata pass-through.
curl
curl https://api.sactl.ai/v1/files/file_01AbCdEfGhIjKlMnOpQrStUv \
-H "Authorization: Bearer YOUR_VK"Errors
| HTTP | code | Scenario |
|---|---|---|
| 401 | key_invalid | VK invalid. |
| 404 | bad_request | file id does not exist or does not belong to this VK. |
Delete file
Delete a file. After deletion, /v1/messages calls referencing the file_id return bad_request. The audit log emits file.deleted.
Response
{
"id": "file_01AbCdEfGhIjKlMnOpQrStUv",
"type": "file_deleted"
}curl
curl -X DELETE https://api.sactl.ai/v1/files/file_01AbCdEfGhIjKlMnOpQrStUv \
-H "Authorization: Bearer YOUR_VK"Batch API
Submit large numbers of Messages requests as a single batch. State machine: validating → in_progress → ended | canceled | failed. Results are returned as JSONL and are downloadable only in the ended state.
Billing: batch calls take Anthropic's 50% batch discount stacked with our × 0.30 — Claude batch price = official × 0.15 (see the pricing page). Billing settlement happens only at batch completion or cancellation — canceled / failed requests are not billed.
Create message batch
Submit a batch of Messages requests. Each request body is equivalent to a single /v1/messages call, wrapped with a custom_id for reconciliation.
Request body
| Field | Type | Description | |
|---|---|---|---|
| requests | array | required | 1 – 10,000 {custom_id, params} items. |
| requests[].custom_id | string | required | Unique within the batch; used to map results back to your business id. |
| requests[].params | object | required | Same shape as the /v1/messages request body (model / max_tokens / messages / …). |
Request example
{
"requests": [
{
"custom_id": "job-001",
"params": {
"model": "claude-haiku-4-5",
"max_tokens": 256,
"messages": [ { "role": "user", "content": "Summarize Go channels." } ]
}
},
{
"custom_id": "job-002",
"params": {
"model": "claude-haiku-4-5",
"max_tokens": 256,
"messages": [ { "role": "user", "content": "What is GCRA?" } ]
}
}
]
}Response
{
"id": "msgbatch_01Wx9...",
"type": "message_batch",
"processing_status": "in_progress",
"request_counts": { "processing": 2, "succeeded": 0, "errored": 0, "canceled": 0, "expired": 0 },
"ended_at": null,
"created_at": "2026-04-20T08:12:31.441Z",
"expires_at": "2026-04-21T08:12:31.441Z",
"cancel_initiated_at": null,
"results_url": null
}curl
curl https://api.sactl.ai/v1/messages/batches \ -H "Authorization: Bearer YOUR_VK" \ -H "Content-Type: application/json" \ -d @batch.json
List message batches
List all batches for the current VK. Supports cursor pagination via limit / before_id / after_id.
curl
curl "https://api.sactl.ai/v1/messages/batches?limit=20" \ -H "Authorization: Bearer YOUR_VK"
Retrieve message batch
Read the status of a single batch. processing_status transitions through validating → in_progress → ended / canceled / failed.
Response
{
"id": "msgbatch_01Wx9...",
"processing_status": "ended",
"request_counts": { "processing": 0, "succeeded": 2, "errored": 0, "canceled": 0, "expired": 0 },
"ended_at": "2026-04-20T08:14:02.112Z",
"results_url": "https://api.sactl.ai/v1/messages/batches/msgbatch_01Wx9.../results"
}curl
curl https://api.sactl.ai/v1/messages/batches/msgbatch_01Wx9... \
-H "Authorization: Bearer YOUR_VK"Retrieve message batch results
Returns per-request results as JSONL (application/x-ndjson). Only available when processing_status=ended; other states return bad_request.
Response (one line per request)
{"custom_id":"job-001","result":{"type":"succeeded","message":{"id":"msg_...","content":[...],"usage":{...}}}}
{"custom_id":"job-002","result":{"type":"succeeded","message":{"id":"msg_...","content":[...],"usage":{...}}}}curl
curl https://api.sactl.ai/v1/messages/batches/msgbatch_01Wx9.../results \
-H "Authorization: Bearer YOUR_VK"Cancel message batch
Move an in_progress / validating batch to canceling. In-flight requests may still complete and get billed normally; canceled requests are not billed.
curl
curl -X POST https://api.sactl.ai/v1/messages/batches/msgbatch_01Wx9.../cancel \
-H "Authorization: Bearer YOUR_VK"System
Health check and metrics. No VK required — but in production these should be restricted to ops-network CIDRs at the ingress layer.
Health check
Health check. Returns 200 + JSON; fields report connectivity to each dependency. Suitable as a Kubernetes liveness / readiness probe target. No auth required.
Response
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "ok",
"redis": "ok",
"vault": "ok",
"version": "0.9.3",
"commit": "a1b2c3d",
"uptime_seconds": 84221
}curl
curl https://api.sactl.ai/health
Prometheus metrics
Prometheus metrics endpoint. Returns all business metrics in text/plain Prometheus format: rl_* (rate limiting), forward_* (forwarding), upstream_* (upstream latency / errors), breaker_* (circuit breakers), audit_* (audit pipeline). Add it to your Prometheus scrape config.
Response (excerpt)
# HELP rl_reject_total Number of requests rejected by rate limiter # TYPE rl_reject_total counter rl_reject_total{dim="tenant"} 12 rl_reject_total{dim="vk"} 3 rl_reject_total{dim="vk_model"} 1 rl_reject_total{dim="vk_ip"} 0 # HELP forward_request_duration_seconds Request duration histogram # TYPE forward_request_duration_seconds histogram forward_request_duration_seconds_bucket{route="/v1/messages",le="0.5"} 4821 forward_request_duration_seconds_bucket{route="/v1/messages",le="1"} 5910 forward_request_duration_seconds_bucket{route="/v1/messages",le="+Inf"} 6024 # HELP upstream_5xx_total Upstream 5xx responses # TYPE upstream_5xx_total counter upstream_5xx_total{upstream="anthropic"} 2 # HELP audit_worm_append_total WORM audit log appends # TYPE audit_worm_append_total counter audit_worm_append_total{event="message.ok"} 5932 audit_worm_append_total{event="file.uploaded"} 18
curl
curl https://api.sactl.ai/metrics
Preview / Coming soon
The capabilities below are code-complete but not wired to the hot path, or are still in canary. Production customers should not rely on their behavior.
Multi-key pool PickMiddleware preview
Multiple API keys hang under a single upstream provider; key selection is based on health / cost / usage. When a key gets a 429 / 401 from upstream it enters a cooldown window. When all keys cool down, the provider itself trips and returns 503 service_unavailable.
- Current status: middleware code is implemented but not on the forward path by default.
- Expected GA: 2026 Q2. Wiring it in does not change the API contract — it just turns
503 service_unavailablefrom "theoretically possible" into actually observed.
Markup multiplier settle preview
For multi-tier reseller scenarios, VKs can carry a markup_multiplier (e.g. 1.20 = add a 20% channel margin); at settlement the margin is routed to the channel account automatically.
- Current status: the schema field is in the database; the settlement step does not yet apply the multiplier, so actual billing equals the underlying price.
- Expected GA: same batch as Multi-key pool, 2026 Q2.
- Note: this does not expose any new endpoint to the client; it only changes what the
X-SACTL-Usage-Cost-USDheader amount means.