DOCUMENTATION

SACTL Docs

From sk-xa-* key to Claude Code / Codex CLI / Cursor / OpenAI SDK in under 5 minutes. Every code sample is copy-paste ready.

v1.0 · 2026-04

5-minute walkthrough

Zero to first successful request in three steps.

  1. Top up via Telegram support (WeChat / Alipay / USDT / wire — pick any). Once funded, you can get a key.
  2. Support sends an sk-xa-prod-*** directly, or you generate one in the dashboard. Set a monthly budget cap so a leaked key can't drain your balance.
  3. Point your client (Claude Code / Codex CLI / Cursor / OpenAI SDK / curl) base_url at https://api.sactl.ai and swap the API key to sk-xa-*. Code stays the same. Codex CLI uses an OpenAI provider: base_url=https://api.sactl.ai/v1, wire_api=responses.

Your first curl request

curl -X POST https://api.sactl.ai/v1/messages \
  -H "x-api-key: sk-xa-prod-xxxxxxxxxxxx" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello, SACTL."}
    ]
  }'
Shown once The Virtual Key plaintext is shown exactly once at creation. If you lose it, the only path is rotation. The server stores only an HMAC-SHA256 digest — plaintext never lands in the database.

Authentication

SACTL accepts two auth headers — pick whichever matches your client framework:

  • Anthropic native: x-api-key: sk-xa-prod-...
  • OpenAI-compatible: Authorization: Bearer sk-xa-prod-...

Both are equivalent inside SACTL — they look up the same VK table.

Virtual Key structure

A full VK looks like:

sk-xa-{env}-{base62}

# 示例
sk-xa-prod-7xK9mQ2vL4pN8wRb
└────┬────┘ └──────┬───────┘
  prefix        secret
  • sk-xa-prod-: fixed prefix, identifies env (dev / stage / prod).
  • The first 12 chars act as the Redis lookup prefix for O(1) VK metadata access.
  • The remainder is a base62 random secret. Server side stores an HMAC-SHA256 digest; the pepper lives in Vault.
  • Matched in constant time to defeat timing attacks.

See API Reference · Authentication for the full spec.

Your first call

Send a full request, then read the response shape.

Request

curl -X POST https://api.sactl.ai/v1/messages \
  -H "x-api-key: sk-xa-prod-7xK9mQ2vL4pN8wRb" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Explain HMAC in one sentence."}
    ]
  }'

Response

HTTP/2 200
content-type: application/json
x-sactl-trace-id: 01JQX3A8P9M7K2WB3Z
x-sactl-usage-prompt-tokens: 18
x-sactl-usage-completion-tokens: 52
x-sactl-usage-cost-usd: 0.00041
x-sactl-budget-remaining-usd: 299.9996
x-ratelimit-remaining-requests: 4999

{
  "id": "msg_01ABc...",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-6",
  "content": [
    {"type": "text", "text": "HMAC 是带密钥的哈希消息认证码,用于防篡改。"}
  ],
  "stop_reason": "end_turn",
  "usage": {"input_tokens": 18, "output_tokens": 52}
}

Headers SACTL returns

HeaderMeaning
x-sactl-trace-idCross-request ULID; quote it when filing a complaint and we locate the request in 30 seconds.
x-sactl-usage-prompt-tokensActual input tokens (including system / tools).
x-sactl-usage-completion-tokensOutput tokens (including thinking).
x-sactl-usage-cost-usdUSD spent on this call (Claude × 0.30, GPT as low as 1/30 of official; live for Claude + GPT, Gemini coming soon).
x-sactl-budget-remaining-usdThis VK's remaining monthly budget.
x-ratelimit-remaining-requestsRemaining GCRA capacity in the current window.

Virtual Key vs Provider Key

SACTL's architectural premise: your customers never touch your Anthropic or OpenAI provider key. (Gemini upstream coming; the same model applies.)

Provider Key

  • The paid key you have from Anthropic or OpenAI. Gemini / AWS Bedrock upstream support is coming.
  • Entered in the admin console, encrypted via Vault transit before persisting.
  • The runtime requests one Vault decrypt per request — plaintext never lands in logs or disk.
  • Group keys into a Key Pool: route by Tier (RPM/TPM); when one key trips, fall over to the next chain.

Virtual Key (VK)

  • Issued by SACTL with the sk-xa-* prefix.
  • All limits attach to the VK: allowed_models, ip_whitelist, monthly_budget_usd, rps / burst, expires_at.
  • Billing, audit, and rate-limiting all key off the VK.
  • Customers only ever see the VK. A rotation is a re-keying — other VKs aren't affected.
Data isolation Even if a VK leaks, the worst case is that the attacker burns through that VK's budget cap. The Provider Key never appears in outbound request bodies, error responses, or Inspector metadata — it only lives in the sidecar process memory for a few milliseconds.

Tenant model

SACTL's multi-tenant model is organised in three layers:

Tenant (the customer company you have a contract with)
  └── Users (the customer's portal login users)
      └── Virtual Keys (sk-xa-* used by your applications)

Budget aggregation

  • VK-level budget: independent monthly USD cap per VK.
  • Tenant-level budget: optional shared cap across all VKs in a tenant. Either dimension hitting the cap returns 402.
  • Both use the same atomic Redis SUB/ADD primitive — concurrency-safe.

Isolation guarantees

  • Portal users see only data scoped to their tenant after login.
  • Try to access another tenant's VK ID via URL? IDOR protection returns 404 — it doesn't leak existence.
  • Only Admin users (Console side) can see aggregate data across tenants.

Protocol adapters

SACTL does bidirectional translation between OpenAI Chat Completions and Anthropic Messages.

Typical scenarios

  1. Your customer already has an OpenAI SDK codebase and just wants to flip the base URL to use Claude (the initial main usage).
  2. Reverse direction (Anthropic SDK → GPT upstream) is coming once the GPT pool is wired up.

How to enable

# gateway-sidecar env
SIDECAR_TRANSLATE_OPENAI_MODE=openai-to-anthropic

Clients call POST /v1/chat/completions; SACTL automatically:

  • Map OpenAI roles in messages (system/user/assistant/tool) to Anthropic's system block + message structure.
  • tools: [{type:"function", function:{...}}] → Anthropic tools: [{name, input_schema}]
  • tool_calls[]tool_use — all IDs are bidirectionally mapped and persisted.
  • image_url blocks → Anthropic image blocks.
  • reasoning_effort → Anthropic thinking.budget_tokens

The response is translated back to the OpenAI shape; clients see nothing unusual. Send X-SACTL-Translated: 1 if you want the response to be explicitly tagged as translated (helps with log triage).

Usage cap protection

Every sk-xa-* key can carry a monthly budget cap. We pre-debit before forwarding the request to Claude — if the call would exceed the cap, you get 402 right away and we never spend on the upstream. Even if a key is leaked and someone hammers it with huge max_tokens, the loss is capped to your number.

Pre-debit mechanism

  1. The request enters the sidecar, which parses out model + max_tokens.
  2. The pricing registry is consulted for that model's output_per_mtok.
  3. Compute the upper-bound estimate: estimated_cost = max_tokens × price_per_token × safety_multiplier.
  4. Atomic Redis Lua SUB on both tenant:{id}:budget and vk:{id}:budget at the same time.
  5. If either dimension is short → 402 budget_exhausted; the request is never forwarded upstream.
  6. Once the upstream responds, compute real cost from usage and atomic ADD the difference back.
  7. If it's a 429/4xx (the call never happened or produced no usage), refund the entire pre-debit.
About max_tokens Set max_tokens to the value you actually need — don't pad to 32k as insurance. Padding holds budget reserve and can 402 concurrent requests on the same VK. The refund happens at response time, not request time.

Streaming (SSE)

Add stream: true to the request body to enable streaming. Responses come back as Content-Type: text/event-stream.

Anthropic native endpoint

Event structure follows the Anthropic official spec:

event: message_start
data: {"type":"message_start","message":{"id":"msg_...",...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: message_stop
data: {"type":"message_stop"}

OpenAI-compatible endpoint

SACTL rewrites the upstream Anthropic SSE into OpenAI chunk format:

data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{},"finish_reason":"stop","index":0}]}

data: [DONE]

Python example

import requests

with requests.post(
    "https://api.sactl.ai/v1/messages",
    headers={"x-api-key": "sk-xa-prod-...", "anthropic-version": "2023-06-01"},
    json={"model": "claude-sonnet-4-6", "max_tokens": 512, "stream": True,
          "messages": [{"role": "user", "content": "写首五言绝句"}]},
    stream=True,
) as r:
    for line in r.iter_lines():
        if line.startswith(b"data: "):
            print(line[6:].decode())

Tool Use / Function Calling

Both protocols supported. SACTL internally maps tool_use ↔ tool_calls IDs so client SDKs need zero changes.

Anthropic native

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "tools": [
    {
      "name": "get_weather",
      "description": "查询城市天气",
      "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"]
      }
    }
  ],
  "messages": [{"role": "user", "content": "北京今天天气?"}]
}

The response content array contains a {"type":"tool_use","id":"toolu_...","name":"get_weather","input":{"city":"Beijing"}} block.

OpenAI-compatible

{
  "model": "claude-sonnet-4-6",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "查询城市天气",
        "parameters": {"type":"object","properties":{"city":{"type":"string"}}}
      }
    }
  ],
  "messages": [{"role": "user", "content": "北京今天天气?"}]
}

The response shows choices[0].message.tool_calls[0] in OpenAI shape with id like call_xxx. SACTL maintains a bidirectional call_xxx ↔ toolu_xxx mapping internally.

Complete OpenAI SDK example

from openai import OpenAI

client = OpenAI(
    api_key="sk-xa-prod-...",
    base_url="https://api.sactl.ai/v1",
)

resp = client.chat.completions.create(
    model="claude-sonnet-4-6",
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
            },
        },
    }],
    messages=[{"role": "user", "content": "北京今天天气?"}],
)
print(resp.choices[0].message.tool_calls)

Multimodal (images)

Both formats are supported. SACTL translates each to the Anthropic standard image block before forwarding upstream.

Anthropic format

{
  "role": "user",
  "content": [
    {
      "type": "image",
      "source": {
        "type": "base64",
        "media_type": "image/png",
        "data": "iVBORw0KGgo..."
      }
    },
    {"type": "text", "text": "描述这张图"}
  ]
}

Also supports source.type: "url" to reference a remote URL directly.

OpenAI format

{
  "role": "user",
  "content": [
    {"type": "text", "text": "描述这张图"},
    {"type": "image_url", "image_url": {"url": "https://example.com/pic.jpg"}}
  ]
}
SSRF filtering (mandatory) Every external image URL is validated at parse time. Private networks (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), link-local (169.254.0.0/16, including AWS/GCP instance metadata), loopback (127.0.0.0/8, ::1), and multicast ranges are all blocked with 400 ssrf_forbidden. The resolved IP after DNS is re-checked. HTTP 3xx redirects are not followed.

Extended Thinking

Claude 4.x reasoning traces are passed through verbatim by SACTL and counted in billing.

Anthropic native

{
  "model": "claude-opus-4-6",
  "max_tokens": 8000,
  "thinking": {"type": "enabled", "budget_tokens": 32000},
  "messages": [{"role": "user", "content": "证明 √2 无理"}]
}

OpenAI-compatible

For OpenAI requests, use reasoning_effort; SACTL maps it automatically:

reasoning_effortmaps to budget_tokens
"low"2048
"medium"8000
"high"32000

Beta header

SACTL auto-injects anthropic-beta: 2025-02-07 if the client doesn't send it. No action required.

Billing

Thinking tokens are billed at the output rate and counted in usage.completion_tokens; only the token count is logged, never the thinking content. The Inspector exposes a thinking_tokens field for reconciliation.

Prompt Caching

Mark Claude's system / tools / large-context blocks with cache markers; cache reads cost (30% of official) × 10% — well below Anthropic's official cache pricing.

Client-side explicit declaration

{
  "model": "claude-sonnet-4-6",
  "system": [
    {
      "type": "text",
      "text": "你是一个产品客服助手,用户的问题都按下面的 FAQ 回答...",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [...]
}

Auto-injection (A1 batch)

Set the env vars below and SACTL will auto-add cache markers to high-repetition blocks like system / tools arrays:

SIDECAR_AUTO_CACHE_ENABLED=true
SIDECAR_AUTO_CACHE_MIN_BYTES=1024
SIDECAR_AUTO_CACHE_TTL=300

Opt-out

Set Cache-Control: no-transform on the request and SACTL skips auto-injection.

See Pricing · Model table for rates (the Cache Write/Read columns already reflect × 0.30). Claude cache write / read is billed at × 0.30 across the board on SACTL.

Messages Batches

Mirrors Anthropic's Messages Batches API for async batch jobs (up to 10,000 requests).

Submit

curl -X POST https://api.sactl.ai/v1/messages/batches \
  -H "x-api-key: sk-xa-prod-..." \
  -H "Content-Type: application/json" \
  -d '{
    "requests": [
      {"custom_id": "r-001", "params": {"model":"claude-sonnet-4-6","max_tokens":256,"messages":[{"role":"user","content":"Hi 1"}]}},
      {"custom_id": "r-002", "params": {"model":"claude-sonnet-4-6","max_tokens":256,"messages":[{"role":"user","content":"Hi 2"}]}}
    ]
  }'

State machine

  • validatesin_progressended
  • ended.result:completed | canceled | expired | failed

Polling / result download

# 轮询状态
curl -H "x-api-key: sk-xa-prod-..." \
  https://api.sactl.ai/v1/messages/batches/msgbatch_abc

# 下载 results JSONL
curl -H "x-api-key: sk-xa-prod-..." \
  https://api.sactl.ai/v1/messages/batches/msgbatch_abc/results

Pricing

Anthropic's official batch price = standard × 0.5. SACTL stacks our × 0.30 on top:

SACTL batch price = official standard × 0.30 × 0.5 = official × 0.15

Files API

Upload documents / images and reference them by file_id in subsequent /v1/messages requests.

Upload

curl -X POST https://api.sactl.ai/v1/files \
  -H "x-api-key: sk-xa-prod-..." \
  -F "[email protected]" \
  -F "purpose=messages"

Limits

  • Per-file size cap: SIDECAR_FILES_MAX_MB, default 32 MB.
  • Allowed MIME types: application/pdf, image/png|jpeg|webp, text/plain, text/markdown.
  • Upload bodies are magic-byte scanned; mismatched content-type vs. actual file is rejected.

Audit

Every upload / delete / read writes a billing log (file.uploaded, file.deleted, file.accessed) for reconciliation. File bodies live in a dedicated bucket and are auto-purged at end of lifecycle.

References

{
  "role": "user",
  "content": [
    {"type": "document", "source": {"type": "file", "file_id": "file_abc123"}},
    {"type": "text", "text": "给这份报告写 200 字摘要"}
  ]
}

Model list

Returns the models the current VK has access to (allowed_models ∩ pricing registry), dynamically.

curl -H "x-api-key: sk-xa-prod-..." \
  https://api.sactl.ai/v1/models

Response (Anthropic shape)

{
  "data": [
    {"id": "claude-opus-4-6", "type": "model", "display_name": "Claude Opus 4.6"},
    {"id": "claude-sonnet-4-6", "type": "model", "display_name": "Claude Sonnet 4.6"},
    {"id": "claude-haiku-4-5", "type": "model", "display_name": "Claude Haiku 4.5"}
  ],
  "has_more": false
}

OpenAI clients hitting /v1/models receive the OpenAI shape ({object:"list", data:[...]}).

Rate-limit dimensions

Four-dimensional GCRA (Generic Cell Rate Algorithm, atomic Redis Lua implementation):

  1. Tenant aggregate rl:tenant:{tenant_id} — capacity shared by all VKs
  2. Single VK rl:vk:{vk_id} — primary client-facing limit
  3. VK × model rl:vkm:{vk_id}:{model} — prevents hot models from monopolising capacity
  4. VK × IP rl:vki:{vk_id}:{ip} — credential stuffing / bot defence

Any dimension rejecting → short-circuit 429 with Retry-After: {s}. We never hit the upstream and never pre-debit budget.

Parameter sources

  • VK dimension: virtual_keys.rps / burst (editable at portal creation time)
  • Tenant dimension: tenants.policy.rps (admin console)
  • Model dimension: model_policy.{model}.max_rps (pricing registry, per-model)
  • IP dimension: default 5 RPS / burst 20, adjustable in global config

Response headers

HTTP/2 429
retry-after: 3
x-ratelimit-limit-requests: 5000
x-ratelimit-remaining-requests: 0
x-ratelimit-reset-requests: 1713600000
x-sactl-rl-dim: vk

{"error":{"code":"rate_limited","message":"rate limit exceeded","trace_id":"01JQX..."}}

Error code list

All errors share the same shape: {error:{code, message, trace_id}}. Full response spec at API Reference · Errors.

HTTPcodeDescription
400invalid_requestMalformed body / wrong field; message describes the schema violation
400ssrf_forbiddenMultimodal URL hit the SSRF blocklist (private / link-local / loopback)
400content_blocked_by_policyPII policy with action: block matched
401key_invalidVK does not exist, was revoked, or expired
402budget_exhaustedTenant or VK budget too low for pre-debit
403model_not_allowedThe VK's allowed_models doesn't include this model
403ip_not_allowedVK has an ip_whitelist and the source IP isn't on it
413payload_too_largebody > SIDECAR_MAX_BODY_MB or file > SIDECAR_FILES_MAX_MB
415unsupported_media_typeFiles API upload MIME not in allowlist
429rate_limitedGCRA tripped on at least one dimension; carries Retry-After
499client_closed_requestClient disconnected early (upstream already started, usage still billed)
500internalInternal sidecar error, already paging
502upstream_errorUpstream returned an unexpected response, sanitised
503upstream_unavailableAll available provider keys are circuit-broken
504upstream_timeoutUpstream exceeded SIDECAR_UPSTREAM_TIMEOUT (default 120s)

Retry strategy

Recommended client implementation:

Error codeRetryableStrategy
upstream_error (502)YesExponential backoff 2s / 4s / 8s, up to 3 attempts
upstream_timeout (504)YesSame — consider switching model on the second attempt
upstream_unavailable (503)YesAll providers circuit-broken; wait 30s+
rate_limited (429)YesStrictly back off per Retry-After
budget_exhausted (402)NoTop up first; then retry
key_invalid (401)NoIssue a fresh VK to the user
model_not_allowed (403)NoChange the model or update VK config
ssrf_forbidden (400)NoUse a publicly reachable URL
Internal auto-retry The sidecar internally retries upstream transients (network blips, momentary 429 on a single key) once, falling over to another provider key chain. The 5xx your client sees has already failed after a second attempt — so client-side retries don't need to be aggressive.

Data handling

SACTL is a pass-through service. We don't store content. Specifically:

Things we don't store

  • Raw request prompts: what you send to Claude is never persisted. Logs only carry trace_id, token counts, latency, and the model name.
  • Raw response content: whatever Claude returns is forwarded to you without making a copy. We don't cache (unless you explicitly enable prompt cache) and we don't train on your data.
  • Upstream API keys: how we authenticate to Claude is a backend concern. Those keys never appear in responses and never flow back to you.

Things we do store (for billing and rate-limiting)

  • Per-request trace_id, model, token counts, latency, and amount spent
  • Your top-up and spend balance ledger
  • HMAC digest of your sk-xa-* key (never plaintext — even with the database, the key can't be reversed)

Usage cap protection

Every sk-xa-* key can carry a monthly budget cap. Hitting it returns 402 immediately so a leaked key can't drain the balance.

# 在 Telegram 找客服设上限,或后台一键设置
key: sk-xa-prod-***
monthly_budget_usd: 100   # 这把 key 月最多花 $100
allowed_models: ["claude-sonnet-4-6", "claude-haiku-4-5"]

Multimodal URL filtering

When you send an image_url to Claude, we reject URLs pointing at private networks / loopback / metadata services (10.0.0.0/8, 127.0.0.0/8, 169.254.169.254, etc.) to prevent SSRF abuse. Backend policy — public-facing image hosts aren't affected.

Refunds

Balance is refundable on request at any time — message Telegram support. We deduct what you've used and return the rest to the original payment channel (WeChat / Alipay / USDT / wire). If we ever shut down, we give 30 days notice and refund all balance pro-rata.

OAuth / SSO login

The user management portal supports OAuth login — no separate signup. Currently available:

  • Google
  • GitHub
  • OIDC (Azure AD / Okta / Auth0 / Keycloak) — for enterprise SSO integration

Flow

  1. Browser → GET /api/v1/auth/oauth/{provider}/start
  2. Redirect to the Provider's authorization page
  3. After consent, Provider calls /callback; we exchange for an internal JWT and set an httponly cookie
  4. Subsequent backend requests authenticate via that cookie

OAuth client_secret is stored encrypted — not in env vars, not in plaintext DB. Enterprise SSO is provisioned via Telegram support; you'll need to provide your OIDC metadata URL.

OpenAI SDK

No client code changes — just swap api_key and base_url.

from openai import OpenAI

client = OpenAI(
    api_key="sk-xa-prod-xxxxx",
    base_url="https://api.sactl.ai/v1",
)

resp = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
    apiKey: "sk-xa-prod-xxxxx",
    baseURL: "https://api.sactl.ai/v1",
});

const resp = await client.chat.completions.create({
    model: "claude-sonnet-4-6",
    messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="claude-sonnet-4-6",
    api_key="sk-xa-prod-xxxxx",
    base_url="https://api.sactl.ai/v1",
)

from langchain_core.messages import HumanMessage
resp = llm.invoke([HumanMessage(content="Hello")])
print(resp.content)

Or via env vars: set OPENAI_API_KEY=sk-xa-prod-... and OPENAI_BASE_URL=https://api.sactl.ai/v1; the SDK picks them up automatically.

Anthropic SDK

Python

import anthropic

client = anthropic.Anthropic(
    api_key="sk-xa-prod-xxxxx",
    base_url="https://api.sactl.ai",
)

resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.content[0].text)

Node.js

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
    apiKey: "sk-xa-prod-xxxxx",
    baseURL: "https://api.sactl.ai",
});

const resp = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.content[0].text);

The Anthropic SDK does not require a /v1 suffix — it appends one itself. The OpenAI SDK requires it. That's a historical difference between the two SDKs, not a SACTL issue.

curl

Minimum-viable example. When debugging an SDK issue, hit it with curl first to validate the network path.

curl -X POST https://api.sactl.ai/v1/messages \
  -H "x-api-key: sk-xa-prod-xxxxx" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 512,
    "messages": [
      {"role": "user", "content": "Hello"}
    ]
  }' \
  -w "\n---\nHTTP %{http_code}  time %{time_total}s\n"

OpenAI-compatible endpoint:

curl -X POST https://api.sactl.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-xa-prod-xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role":"user","content":"Hello"}]
  }'

Dashboard / Inspector

Console UI

Admin Console at https://console.sactl.ai — built for your SRE and commercial teams:

  • QPS / cost / latency p50 p95 p99 charts (Recharts)
  • Slice by model / tenant / VK / time range
  • Provider key pool health (circuit-break counts, RPM remaining)
  • Per-tenant monthly billing CSV export

Inspector

Replay request metadata by trace_id:

  • Timestamp, tenant_id, VK prefix (sk-xa-prod-7xK9..., last 6 chars masked)
  • model、prompt_tokens / completion_tokens / thinking_tokens、cost_usd、duration_ms
  • Rate-limit dimensions hit / pre-debit / refund details
  • Upstream provider name + matched key ID prefix
No raw content stored The Inspector only shows latency and cost — never the actual prompt or response content. The backend simply doesn't store it. This is a privacy-by-design decision, not a missing feature and not regulatory theatre.

Prometheus Metrics

Every Go service exposes /metrics; labels are constrained by an allowlist — no high-cardinality labels like user_id.

Important series

MetricPurpose
sidecar_forward_duration_seconds{quantile}Hot-path latency, SLO p99 ≤ 2ms
sidecar_rl_rejects_total{dim=tenant|vk|vkm|vki}Rate-limit rejections bucketed by dimension
sidecar_upstream_errors_total{provider,code}Attribution for upstream 5xx / timeouts
breaker_state{provider}Circuit-breaker state (0 closed / 1 half / 2 open)
budget_preauth_total{result}Pre-debit success / failure count
ssrf_block_totalSSRF filter blocks

Grafana dashboards

The repo's deploy/grafana/dashboards/ ships five ready-to-use boards:

  1. SACTL Hot Path SLO — latency p50/p95/p99 + error-budget burn rate
  2. Rate Limit Heatmap — per-dimension rejections bucketed by hour
  3. Upstream Health — per-provider 5xx rate + circuit-break event stream
  4. Cost Tracking — per-tenant/model running monthly total
  5. Audit Event Stream — real-time security event stream

Recommended Prometheus scrape interval: 10s; retain 30 days raw + 90 days at 5-minute down-sampling.

Log field allowlist

Structured logs may only persist the following fields:

timestamp, level, trace_id, tenant_id, key_id_prefix,
model, status, duration_ms,
tokens_prompt, tokens_completion, cost_usd

Forbidden fields

  • Raw prompt / response content
  • Plaintext api_key (VK or provider)
  • Raw upstream error.message (may carry provider-internal stacks)
  • Raw Authorization headers
  • Any PII (email / national ID / credit card / phone)
Defense in depth Every logger must go through the internal/redact/logging.go wrapper. Forbidden fields are dropped at the code level, and dev mode panics on violation. CI runs gosec + staticcheck rules that scan log.Print, fmt.Print, and direct slog keys against the allowlist. Any violation fails CI — nothing reaches main.

Production logs flow stdout → container runtime → FluentBit → OpenSearch and are retained for 90 days. Billing-related data is stored separately on a different pipeline so reconciliation queries don't tangle with general logs.

Up and running in 30 minutes. Claude at 30%, GPT at 1/30.

Message us on Telegram with "trial". You'll get the top-up link, your API key, and the Claude Code integration doc. Post in the group when something's off — ~5 minute response during working hours.