5-minute walkthrough

Zero to first successful request in three steps.

Top up via Telegram support (WeChat / Alipay / wire — pick any). Once funded, you can get a key.
Support sends an sk-xa-prod-*** directly, or you generate one in the dashboard. Set a monthly budget cap so a leaked key can't drain your balance.
Point your client (Claude Code / Codex CLI / Cursor / OpenAI SDK / curl) base_url at https://api.sactl.ai and swap the API key to sk-xa-*. Code stays the same. Codex CLI uses an OpenAI provider: base_url=https://api.sactl.ai/v1, wire_api=responses.

Your first curl request

curl -X POST https://api.sactl.ai/v1/messages \
  -H "x-api-key: sk-xa-prod-xxxxxxxxxxxx" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello, SACTL."}
    ]
  }'

Shown once The Virtual Key plaintext is shown exactly once at creation. If you lose it, the only path is rotation. The server stores only an HMAC-SHA256 digest — plaintext never lands in the database.

Authentication

SACTL accepts two auth headers — pick whichever matches your client framework:

Anthropic native: x-api-key: sk-xa-prod-...
OpenAI-compatible: Authorization: Bearer sk-xa-prod-...

Both are equivalent inside SACTL — they look up the same VK table.

Virtual Key structure

A full VK looks like:

sk-xa-{env}-{base62}

# 示例
sk-xa-prod-7xK9mQ2vL4pN8wRb
└────┬────┘ └──────┬───────┘
  prefix        secret

sk-xa-prod-: fixed prefix, identifies env (dev / stage / prod).
The first 12 chars act as the Redis lookup prefix for O(1) VK metadata access.
The remainder is a base62 random secret. Server side stores an HMAC-SHA256 digest; the pepper lives in Vault.
Matched in constant time to defeat timing attacks.

See API Reference · Authentication for the full spec.

Your first call

Send a full request, then read the response shape.

Request

curl -X POST https://api.sactl.ai/v1/messages \
  -H "x-api-key: sk-xa-prod-7xK9mQ2vL4pN8wRb" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Explain HMAC in one sentence."}
    ]
  }'

Response

HTTP/2 200
content-type: application/json
x-sactl-trace-id: 01JQX3A8P9M7K2WB3Z
x-sactl-usage-prompt-tokens: 18
x-sactl-usage-completion-tokens: 52
x-sactl-usage-cost-usd: 0.00041
x-sactl-budget-remaining-usd: 299.9996
x-ratelimit-remaining-requests: 4999

{
  "id": "msg_01ABc...",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-6",
  "content": [
    {"type": "text", "text": "HMAC 是带密钥的哈希消息认证码,用于防篡改。"}
  ],
  "stop_reason": "end_turn",
  "usage": {"input_tokens": 18, "output_tokens": 52}
}

Headers SACTL returns

Header	Meaning
`x-sactl-trace-id`	Cross-request ULID; quote it when filing a complaint and we locate the request in 30 seconds.
`x-sactl-usage-prompt-tokens`	Actual input tokens (including system / tools).
`x-sactl-usage-completion-tokens`	Output tokens (including thinking).
`x-sactl-usage-cost-usd`	USD spent on this call (Claude × 0.30, GPT as low as 1/30 of official; live for Claude + GPT, Gemini coming soon).
`x-sactl-budget-remaining-usd`	This VK's remaining monthly budget.
`x-ratelimit-remaining-requests`	Remaining GCRA capacity in the current window.

Virtual Key vs Provider Key

SACTL's architectural premise: your customers never touch your Anthropic or OpenAI provider key. (Gemini upstream coming; the same model applies.)

Provider Key

The paid key you have from Anthropic or OpenAI. Gemini / AWS Bedrock upstream support is coming.
Entered in the admin console, encrypted via Vault transit before persisting.
The runtime requests one Vault decrypt per request — plaintext never lands in logs or disk.
Group keys into a Key Pool: route by Tier (RPM/TPM); when one key trips, fall over to the next chain.

Virtual Key (VK)

Issued by SACTL with the sk-xa-* prefix.
All limits attach to the VK: allowed_models, ip_whitelist, monthly_budget_usd, rps / burst, expires_at.
Billing, audit, and rate-limiting all key off the VK.
Customers only ever see the VK. A rotation is a re-keying — other VKs aren't affected.

Data isolation Even if a VK leaks, the worst case is that the attacker burns through that VK's budget cap. The Provider Key never appears in outbound request bodies, error responses, or Inspector metadata — it only lives in the sidecar process memory for a few milliseconds.

Tenant model

SACTL's multi-tenant model is organised in three layers:

Tenant (the customer company you have a contract with)
  └── Users (the customer's portal login users)
      └── Virtual Keys (sk-xa-* used by your applications)

Budget aggregation

VK-level budget: independent monthly USD cap per VK.
Tenant-level budget: optional shared cap across all VKs in a tenant. Either dimension hitting the cap returns 402.
Both use the same atomic Redis SUB/ADD primitive — concurrency-safe.

Isolation guarantees

Portal users see only data scoped to their tenant after login.
Try to access another tenant's VK ID via URL? IDOR protection returns 404 — it doesn't leak existence.
Only Admin users (Console side) can see aggregate data across tenants.

Protocol adapters

SACTL does bidirectional translation between OpenAI Chat Completions and Anthropic Messages.

Typical scenarios

Your customer already has an OpenAI SDK codebase and just wants to flip the base URL to use Claude (the initial main usage).
Reverse direction (Anthropic SDK → GPT upstream) is coming once the GPT pool is wired up.

How to enable

# gateway-sidecar env
SIDECAR_TRANSLATE_OPENAI_MODE=openai-to-anthropic

Clients call POST /v1/chat/completions; SACTL automatically:

Map OpenAI roles in messages (system/user/assistant/tool) to Anthropic's system block + message structure.
tools: [{type:"function", function:{...}}] → Anthropic tools: [{name, input_schema}]。
tool_calls[] ↔ tool_use — all IDs are bidirectionally mapped and persisted.
image_url blocks → Anthropic image blocks.
reasoning_effort → Anthropic thinking.budget_tokens。

The response is translated back to the OpenAI shape; clients see nothing unusual. Send X-SACTL-Translated: 1 if you want the response to be explicitly tagged as translated (helps with log triage).

Usage cap protection

Every sk-xa-* key can carry a monthly budget cap. We pre-debit before forwarding the request to Claude — if the call would exceed the cap, you get 402 right away and we never spend on the upstream. Even if a key is leaked and someone hammers it with huge max_tokens, the loss is capped to your number.

Pre-debit mechanism

The request enters the sidecar, which parses out model + max_tokens.
The pricing registry is consulted for that model's output_per_mtok.
Compute the upper-bound estimate: estimated_cost = max_tokens × price_per_token × safety_multiplier.
Atomic Redis Lua SUB on both tenant:{id}:budget and vk:{id}:budget at the same time.
If either dimension is short → 402 budget_exhausted; the request is never forwarded upstream.
Once the upstream responds, compute real cost from usage and atomic ADD the difference back.
If it's a 429/4xx (the call never happened or produced no usage), refund the entire pre-debit.

About max_tokens Set max_tokens to the value you actually need — don't pad to 32k as insurance. Padding holds budget reserve and can 402 concurrent requests on the same VK. The refund happens at response time, not request time.

Streaming (SSE)

Add stream: true to the request body to enable streaming. Responses come back as Content-Type: text/event-stream.

Anthropic native endpoint

Event structure follows the Anthropic official spec:

event: message_start
data: {"type":"message_start","message":{"id":"msg_...",...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: message_stop
data: {"type":"message_stop"}

OpenAI-compatible endpoint

SACTL rewrites the upstream Anthropic SSE into OpenAI chunk format:

data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{},"finish_reason":"stop","index":0}]}

data: [DONE]

Python example

import requests

with requests.post(
    "https://api.sactl.ai/v1/messages",
    headers={"x-api-key": "sk-xa-prod-...", "anthropic-version": "2023-06-01"},
    json={"model": "claude-sonnet-4-6", "max_tokens": 512, "stream": True,
          "messages": [{"role": "user", "content": "写首五言绝句"}]},
    stream=True,
) as r:
    for line in r.iter_lines():
        if line.startswith(b"data: "):
            print(line[6:].decode())

Tool Use / Function Calling

Both protocols supported. SACTL internally maps tool_use ↔ tool_calls IDs so client SDKs need zero changes.

Anthropic native

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "tools": [
    {
      "name": "get_weather",
      "description": "查询城市天气",
      "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"]
      }
    }
  ],
  "messages": [{"role": "user", "content": "北京今天天气?"}]
}

The response content array contains a {"type":"tool_use","id":"toolu_...","name":"get_weather","input":{"city":"Beijing"}} block.

OpenAI-compatible

{
  "model": "claude-sonnet-4-6",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "查询城市天气",
        "parameters": {"type":"object","properties":{"city":{"type":"string"}}}
      }
    }
  ],
  "messages": [{"role": "user", "content": "北京今天天气?"}]
}

The response shows choices[0].message.tool_calls[0] in OpenAI shape with id like call_xxx. SACTL maintains a bidirectional call_xxx ↔ toolu_xxx mapping internally.

Complete OpenAI SDK example

from openai import OpenAI

client = OpenAI(
    api_key="sk-xa-prod-...",
    base_url="https://api.sactl.ai/v1",
)

resp = client.chat.completions.create(
    model="claude-sonnet-4-6",
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
            },
        },
    }],
    messages=[{"role": "user", "content": "北京今天天气?"}],
)
print(resp.choices[0].message.tool_calls)

Multimodal (images)

Both formats are supported. SACTL translates each to the Anthropic standard image block before forwarding upstream.

Anthropic format

{
  "role": "user",
  "content": [
    {
      "type": "image",
      "source": {
        "type": "base64",
        "media_type": "image/png",
        "data": "iVBORw0KGgo..."
      }
    },
    {"type": "text", "text": "描述这张图"}
  ]
}

Also supports source.type: "url" to reference a remote URL directly.

OpenAI format

{
  "role": "user",
  "content": [
    {"type": "text", "text": "描述这张图"},
    {"type": "image_url", "image_url": {"url": "https://example.com/pic.jpg"}}
  ]
}

SSRF filtering (mandatory) Every external image URL is validated at parse time. Private networks (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), link-local (169.254.0.0/16, including AWS/GCP instance metadata), loopback (127.0.0.0/8, ::1), and multicast ranges are all blocked with 400 ssrf_forbidden. The resolved IP after DNS is re-checked. HTTP 3xx redirects are not followed.

Extended Thinking

Claude 4.x reasoning traces are passed through verbatim by SACTL and counted in billing.

Anthropic native

{
  "model": "claude-opus-4-6",
  "max_tokens": 8000,
  "thinking": {"type": "enabled", "budget_tokens": 32000},
  "messages": [{"role": "user", "content": "证明 √2 无理"}]
}

OpenAI-compatible

For OpenAI requests, use reasoning_effort; SACTL maps it automatically:

reasoning_effort	maps to budget_tokens
`"low"`	2048
`"medium"`	8000
`"high"`	32000

Beta header

SACTL auto-injects anthropic-beta: 2025-02-07 if the client doesn't send it. No action required.

Billing

Thinking tokens are billed at the output rate and counted in usage.completion_tokens; only the token count is logged, never the thinking content. The Inspector exposes a thinking_tokens field for reconciliation.

Prompt Caching

Mark Claude's system / tools / large-context blocks with cache markers; cache reads cost (30% of official) × 10% — well below Anthropic's official cache pricing.

Client-side explicit declaration

{
  "model": "claude-sonnet-4-6",
  "system": [
    {
      "type": "text",
      "text": "你是一个产品客服助手,用户的问题都按下面的 FAQ 回答...",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [...]
}

Auto-injection (A1 batch)

Set the env vars below and SACTL will auto-add cache markers to high-repetition blocks like system / tools arrays:

SIDECAR_AUTO_CACHE_ENABLED=true
SIDECAR_AUTO_CACHE_MIN_BYTES=1024
SIDECAR_AUTO_CACHE_TTL=300

Opt-out

Set Cache-Control: no-transform on the request and SACTL skips auto-injection.

See Pricing · Model table for rates (the Cache Write/Read columns already reflect × 0.30). Claude cache write / read is billed at × 0.30 across the board on SACTL.

Messages Batches

Mirrors Anthropic's Messages Batches API for async batch jobs (up to 10,000 requests).

Submit

curl -X POST https://api.sactl.ai/v1/messages/batches \
  -H "x-api-key: sk-xa-prod-..." \
  -H "Content-Type: application/json" \
  -d '{
    "requests": [
      {"custom_id": "r-001", "params": {"model":"claude-sonnet-4-6","max_tokens":256,"messages":[{"role":"user","content":"Hi 1"}]}},
      {"custom_id": "r-002", "params": {"model":"claude-sonnet-4-6","max_tokens":256,"messages":[{"role":"user","content":"Hi 2"}]}}
    ]
  }'

State machine

validates → in_progress → ended
ended.result:completed | canceled | expired | failed

Polling / result download

# 轮询状态
curl -H "x-api-key: sk-xa-prod-..." \
  https://api.sactl.ai/v1/messages/batches/msgbatch_abc

# 下载 results JSONL
curl -H "x-api-key: sk-xa-prod-..." \
  https://api.sactl.ai/v1/messages/batches/msgbatch_abc/results

Pricing

Anthropic's official batch price = standard × 0.5. SACTL stacks our × 0.30 on top:

SACTL batch price = official standard × 0.30 × 0.5 = official × 0.15

Files API

Upload documents / images and reference them by file_id in subsequent /v1/messages requests.

Upload

curl -X POST https://api.sactl.ai/v1/files \
  -H "x-api-key: sk-xa-prod-..." \
  -F "[email protected]" \
  -F "purpose=messages"

Limits

Per-file size cap: SIDECAR_FILES_MAX_MB, default 32 MB.
Allowed MIME types: application/pdf, image/png|jpeg|webp, text/plain, text/markdown.
Upload bodies are magic-byte scanned; mismatched content-type vs. actual file is rejected.

Audit

Every upload / delete / read writes a billing log (file.uploaded, file.deleted, file.accessed) for reconciliation. File bodies live in a dedicated bucket and are auto-purged at end of lifecycle.

References

{
  "role": "user",
  "content": [
    {"type": "document", "source": {"type": "file", "file_id": "file_abc123"}},
    {"type": "text", "text": "给这份报告写 200 字摘要"}
  ]
}

Model list

Returns the models the current VK has access to (allowed_models ∩ pricing registry), dynamically.

curl -H "x-api-key: sk-xa-prod-..." \
  https://api.sactl.ai/v1/models

Response (Anthropic shape)

{
  "data": [
    {"id": "claude-opus-4-6", "type": "model", "display_name": "Claude Opus 4.6"},
    {"id": "claude-sonnet-4-6", "type": "model", "display_name": "Claude Sonnet 4.6"},
    {"id": "claude-haiku-4-5", "type": "model", "display_name": "Claude Haiku 4.5"}
  ],
  "has_more": false
}

OpenAI clients hitting /v1/models receive the OpenAI shape ({object:"list", data:[...]}).

Rate-limit dimensions

Four-dimensional GCRA (Generic Cell Rate Algorithm, atomic Redis Lua implementation):

Tenant aggregate rl:tenant:{tenant_id} — capacity shared by all VKs
Single VK rl:vk:{vk_id} — primary client-facing limit
VK × model rl:vkm:{vk_id}:{model} — prevents hot models from monopolising capacity
VK × IP rl:vki:{vk_id}:{ip} — credential stuffing / bot defence

Any dimension rejecting → short-circuit 429 with Retry-After: {s}. We never hit the upstream and never pre-debit budget.

Parameter sources

VK dimension: virtual_keys.rps / burst (editable at portal creation time)
Tenant dimension: tenants.policy.rps (admin console)
Model dimension: model_policy.{model}.max_rps (pricing registry, per-model)
IP dimension: default 5 RPS / burst 20, adjustable in global config

Response headers

HTTP/2 429
retry-after: 3
x-ratelimit-limit-requests: 5000
x-ratelimit-remaining-requests: 0
x-ratelimit-reset-requests: 1713600000
x-sactl-rl-dim: vk

{"error":{"code":"rate_limited","message":"rate limit exceeded","trace_id":"01JQX..."}}

Error code list

All errors share the same shape: {error:{code, message, trace_id}}. Full response spec at API Reference · Errors.

HTTP	code	Description
400	`invalid_request`	Malformed body / wrong field; message describes the schema violation
400	`ssrf_forbidden`	Multimodal URL hit the SSRF blocklist (private / link-local / loopback)
400	`content_blocked_by_policy`	PII policy with `action: block` matched
401	`key_invalid`	VK does not exist, was revoked, or expired
402	`budget_exhausted`	Tenant or VK budget too low for pre-debit
403	`model_not_allowed`	The VK's `allowed_models` doesn't include this model
403	`ip_not_allowed`	VK has an ip_whitelist and the source IP isn't on it
413	`payload_too_large`	body > `SIDECAR_MAX_BODY_MB` or file > `SIDECAR_FILES_MAX_MB`
415	`unsupported_media_type`	Files API upload MIME not in allowlist
429	`rate_limited`	GCRA tripped on at least one dimension; carries Retry-After
499	`client_closed_request`	Client disconnected early (upstream already started, usage still billed)
500	`internal`	Internal sidecar error, already paging
502	`upstream_error`	Upstream returned an unexpected response, sanitised
503	`upstream_unavailable`	All available provider keys are circuit-broken
504	`upstream_timeout`	Upstream exceeded `SIDECAR_UPSTREAM_TIMEOUT` (default 120s)

Retry strategy

Recommended client implementation:

Error code	Retryable	Strategy
`upstream_error` (502)	Yes	Exponential backoff 2s / 4s / 8s, up to 3 attempts
`upstream_timeout` (504)	Yes	Same — consider switching model on the second attempt
`upstream_unavailable` (503)	Yes	All providers circuit-broken; wait 30s+
`rate_limited` (429)	Yes	Strictly back off per `Retry-After`
`budget_exhausted` (402)	No	Top up first; then retry
`key_invalid` (401)	No	Issue a fresh VK to the user
`model_not_allowed` (403)	No	Change the model or update VK config
`ssrf_forbidden` (400)	No	Use a publicly reachable URL

Internal auto-retry The sidecar internally retries upstream transients (network blips, momentary 429 on a single key) once, falling over to another provider key chain. The 5xx your client sees has already failed after a second attempt — so client-side retries don't need to be aggressive.

Data handling

SACTL is a pass-through service. We don't store content. Specifically:

Things we don't store

Raw request prompts: what you send to Claude is never persisted. Logs only carry trace_id, token counts, latency, and the model name.
Raw response content: whatever Claude returns is forwarded to you without making a copy. We don't cache (unless you explicitly enable prompt cache) and we don't train on your data.
Upstream API keys: how we authenticate to Claude is a backend concern. Those keys never appear in responses and never flow back to you.

Things we do store (for billing and rate-limiting)

Per-request trace_id, model, token counts, latency, and amount spent
Your top-up and spend balance ledger
HMAC digest of your sk-xa-* key (never plaintext — even with the database, the key can't be reversed)

Usage cap protection

Every sk-xa-* key can carry a monthly budget cap. Hitting it returns 402 immediately so a leaked key can't drain the balance.

# 在 Telegram 找客服设上限,或后台一键设置
key: sk-xa-prod-***
monthly_budget_usd: 100   # 这把 key 月最多花 $100
allowed_models: ["claude-sonnet-4-6", "claude-haiku-4-5"]

Multimodal URL filtering

When you send an image_url to Claude, we reject URLs pointing at private networks / loopback / metadata services (10.0.0.0/8, 127.0.0.0/8, 169.254.169.254, etc.) to prevent SSRF abuse. Backend policy — public-facing image hosts aren't affected.

Refunds

Balance is refundable on request at any time — message Telegram support. We deduct what you've used and return the rest to the original payment channel (WeChat / Alipay / wire). If we ever shut down, we give 30 days notice and refund all balance pro-rata.

OAuth / SSO login

The user management portal supports OAuth login — no separate signup. Currently available:

Google
GitHub
OIDC (Azure AD / Okta / Auth0 / Keycloak) — for enterprise SSO integration

Flow

Browser → GET /api/v1/auth/oauth/{provider}/start
Redirect to the Provider's authorization page
After consent, Provider calls /callback; we exchange for an internal JWT and set an httponly cookie
Subsequent backend requests authenticate via that cookie

OAuth client_secret is stored encrypted — not in env vars, not in plaintext DB. Enterprise SSO is provisioned via Telegram support; you'll need to provide your OIDC metadata URL.

OpenAI SDK

No client code changes — just swap api_key and base_url.

from openai import OpenAI

client = OpenAI(
    api_key="sk-xa-prod-xxxxx",
    base_url="https://api.sactl.ai/v1",
)

resp = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
    apiKey: "sk-xa-prod-xxxxx",
    baseURL: "https://api.sactl.ai/v1",
});

const resp = await client.chat.completions.create({
    model: "claude-sonnet-4-6",
    messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="claude-sonnet-4-6",
    api_key="sk-xa-prod-xxxxx",
    base_url="https://api.sactl.ai/v1",
)

from langchain_core.messages import HumanMessage
resp = llm.invoke([HumanMessage(content="Hello")])
print(resp.content)

Or via env vars: set OPENAI_API_KEY=sk-xa-prod-... and OPENAI_BASE_URL=https://api.sactl.ai/v1; the SDK picks them up automatically.

Anthropic SDK

Python

import anthropic

client = anthropic.Anthropic(
    api_key="sk-xa-prod-xxxxx",
    base_url="https://api.sactl.ai",
)

resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.content[0].text)

Node.js

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
    apiKey: "sk-xa-prod-xxxxx",
    baseURL: "https://api.sactl.ai",
});

const resp = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.content[0].text);

The Anthropic SDK does not require a /v1 suffix — it appends one itself. The OpenAI SDK requires it. That's a historical difference between the two SDKs, not a SACTL issue.

curl

Minimum-viable example. When debugging an SDK issue, hit it with curl first to validate the network path.

curl -X POST https://api.sactl.ai/v1/messages \
  -H "x-api-key: sk-xa-prod-xxxxx" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 512,
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }' \
  -w "\n---\nHTTP %{http_code}  time %{time_total}s\n"

OpenAI-compatible endpoint:

curl -X POST https://api.sactl.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-xa-prod-xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role":"user","content":"Hello"}]
  }'

Dashboard / Inspector

Console UI

Admin Console at https://console.sactl.ai — built for your SRE and commercial teams:

QPS / cost / latency p50 p95 p99 charts (Recharts)
Slice by model / tenant / VK / time range
Provider key pool health (circuit-break counts, RPM remaining)
Per-tenant monthly billing CSV export

Inspector

Replay request metadata by trace_id:

Timestamp, tenant_id, VK prefix (sk-xa-prod-7xK9..., last 6 chars masked)
model、prompt_tokens / completion_tokens / thinking_tokens、cost_usd、duration_ms
Rate-limit dimensions hit / pre-debit / refund details
Upstream provider name + matched key ID prefix

No raw content stored The Inspector only shows latency and cost — never the actual prompt or response content. The backend simply doesn't store it. This is a privacy-by-design decision, not a missing feature and not regulatory theatre.

Prometheus Metrics

Every Go service exposes /metrics; labels are constrained by an allowlist — no high-cardinality labels like user_id.

Important series

Metric	Purpose
`sidecar_forward_duration_seconds{quantile}`	Hot-path latency, SLO p99 ≤ 2ms
`sidecar_rl_rejects_total{dim=tenant\|vk\|vkm\|vki}`	Rate-limit rejections bucketed by dimension
`sidecar_upstream_errors_total{provider,code}`	Attribution for upstream 5xx / timeouts
`breaker_state{provider}`	Circuit-breaker state (0 closed / 1 half / 2 open)
`budget_preauth_total{result}`	Pre-debit success / failure count
`ssrf_block_total`	SSRF filter blocks

Grafana dashboards

The repo's deploy/grafana/dashboards/ ships five ready-to-use boards:

SACTL Hot Path SLO — latency p50/p95/p99 + error-budget burn rate
Rate Limit Heatmap — per-dimension rejections bucketed by hour
Upstream Health — per-provider 5xx rate + circuit-break event stream
Cost Tracking — per-tenant/model running monthly total
Audit Event Stream — real-time security event stream

Recommended Prometheus scrape interval: 10s; retain 30 days raw + 90 days at 5-minute down-sampling.

Log field allowlist

Structured logs may only persist the following fields:

timestamp, level, trace_id, tenant_id, key_id_prefix,
model, status, duration_ms,
tokens_prompt, tokens_completion, cost_usd

Forbidden fields

Raw prompt / response content
Plaintext api_key (VK or provider)
Raw upstream error.message (may carry provider-internal stacks)
Raw Authorization headers
Any PII (email / national ID / credit card / phone)

Defense in depth Every logger must go through the internal/redact/logging.go wrapper. Forbidden fields are dropped at the code level, and dev mode panics on violation. CI runs gosec + staticcheck rules that scan log.Print, fmt.Print, and direct slog keys against the allowlist. Any violation fails CI — nothing reaches main.

Production logs flow stdout → container runtime → FluentBit → OpenSearch and are retained for 90 days. Billing-related data is stored separately on a different pipeline so reconciliation queries don't tangle with general logs.

SACTL Docs

5-minute walkthrough

Your first curl request

Authentication

Virtual Key structure

Your first call

Request

Response

Headers SACTL returns

Virtual Key vs Provider Key

Provider Key

Virtual Key (VK)

Tenant model

Budget aggregation

Isolation guarantees

Protocol adapters

Typical scenarios

How to enable

Usage cap protection

Pre-debit mechanism

Streaming (SSE)

Anthropic native endpoint

OpenAI-compatible endpoint

Python example

Tool Use / Function Calling

Anthropic native

OpenAI-compatible

Complete OpenAI SDK example

Multimodal (images)

Anthropic format

OpenAI format

Extended Thinking

Anthropic native

OpenAI-compatible

Beta header

Billing

Prompt Caching

Client-side explicit declaration

Auto-injection (A1 batch)

Opt-out

Messages Batches

Submit

State machine

Polling / result download

Pricing

Files API

Upload

Limits

Audit

References

Model list

Response (Anthropic shape)

Rate-limit dimensions

Parameter sources

Response headers

Error code list

Retry strategy

Data handling

Things we don't store

Things we do store (for billing and rate-limiting)

Usage cap protection

Multimodal URL filtering

Refunds

OAuth / SSO login

Flow

OpenAI SDK

Anthropic SDK

Python

Node.js

curl

Dashboard / Inspector

Console UI

Inspector

Prometheus Metrics

Important series

Grafana dashboards

Log field allowlist

Forbidden fields

Up and running in 30 minutes. Claude at 30%, GPT at 1/30.