5-minute walkthrough
Zero to first successful request in three steps.
- Top up via Telegram support (WeChat / Alipay / USDT / wire — pick any). Once funded, you can get a key.
- Support sends an
sk-xa-prod-***directly, or you generate one in the dashboard. Set a monthly budget cap so a leaked key can't drain your balance. - Point your client (Claude Code / Codex CLI / Cursor / OpenAI SDK / curl)
base_urlathttps://api.sactl.aiand swap the API key tosk-xa-*. Code stays the same. Codex CLI uses an OpenAI provider:base_url=https://api.sactl.ai/v1,wire_api=responses.
Your first curl request
curl -X POST https://api.sactl.ai/v1/messages \ -H "x-api-key: sk-xa-prod-xxxxxxxxxxxx" \ -H "anthropic-version: 2023-06-01" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "max_tokens": 256, "messages": [ {"role": "user", "content": "Hello, SACTL."} ] }'
Authentication
SACTL accepts two auth headers — pick whichever matches your client framework:
- Anthropic native:
x-api-key: sk-xa-prod-... - OpenAI-compatible:
Authorization: Bearer sk-xa-prod-...
Both are equivalent inside SACTL — they look up the same VK table.
Virtual Key structure
A full VK looks like:
sk-xa-{env}-{base62} # 示例 sk-xa-prod-7xK9mQ2vL4pN8wRb └────┬────┘ └──────┬───────┘ prefix secret
sk-xa-prod-: fixed prefix, identifies env (dev / stage / prod).- The first 12 chars act as the Redis lookup prefix for O(1) VK metadata access.
- The remainder is a base62 random secret. Server side stores an HMAC-SHA256 digest; the pepper lives in Vault.
- Matched in constant time to defeat timing attacks.
See API Reference · Authentication for the full spec.
Your first call
Send a full request, then read the response shape.
Request
curl -X POST https://api.sactl.ai/v1/messages \ -H "x-api-key: sk-xa-prod-7xK9mQ2vL4pN8wRb" \ -H "anthropic-version: 2023-06-01" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "max_tokens": 256, "messages": [ {"role": "user", "content": "Explain HMAC in one sentence."} ] }'
Response
HTTP/2 200 content-type: application/json x-sactl-trace-id: 01JQX3A8P9M7K2WB3Z x-sactl-usage-prompt-tokens: 18 x-sactl-usage-completion-tokens: 52 x-sactl-usage-cost-usd: 0.00041 x-sactl-budget-remaining-usd: 299.9996 x-ratelimit-remaining-requests: 4999 { "id": "msg_01ABc...", "type": "message", "role": "assistant", "model": "claude-sonnet-4-6", "content": [ {"type": "text", "text": "HMAC 是带密钥的哈希消息认证码,用于防篡改。"} ], "stop_reason": "end_turn", "usage": {"input_tokens": 18, "output_tokens": 52} }
Headers SACTL returns
| Header | Meaning |
|---|---|
x-sactl-trace-id | Cross-request ULID; quote it when filing a complaint and we locate the request in 30 seconds. |
x-sactl-usage-prompt-tokens | Actual input tokens (including system / tools). |
x-sactl-usage-completion-tokens | Output tokens (including thinking). |
x-sactl-usage-cost-usd | USD spent on this call (Claude × 0.30, GPT as low as 1/30 of official; live for Claude + GPT, Gemini coming soon). |
x-sactl-budget-remaining-usd | This VK's remaining monthly budget. |
x-ratelimit-remaining-requests | Remaining GCRA capacity in the current window. |
Virtual Key vs Provider Key
SACTL's architectural premise: your customers never touch your Anthropic or OpenAI provider key. (Gemini upstream coming; the same model applies.)
Provider Key
- The paid key you have from Anthropic or OpenAI. Gemini / AWS Bedrock upstream support is coming.
- Entered in the admin console, encrypted via Vault transit before persisting.
- The runtime requests one Vault decrypt per request — plaintext never lands in logs or disk.
- Group keys into a Key Pool: route by Tier (RPM/TPM); when one key trips, fall over to the next chain.
Virtual Key (VK)
- Issued by SACTL with the
sk-xa-*prefix. - All limits attach to the VK: allowed_models, ip_whitelist, monthly_budget_usd, rps / burst, expires_at.
- Billing, audit, and rate-limiting all key off the VK.
- Customers only ever see the VK. A rotation is a re-keying — other VKs aren't affected.
Tenant model
SACTL's multi-tenant model is organised in three layers:
Tenant (the customer company you have a contract with)
└── Users (the customer's portal login users)
└── Virtual Keys (sk-xa-* used by your applications)Budget aggregation
- VK-level budget: independent monthly USD cap per VK.
- Tenant-level budget: optional shared cap across all VKs in a tenant. Either dimension hitting the cap returns 402.
- Both use the same atomic Redis SUB/ADD primitive — concurrency-safe.
Isolation guarantees
- Portal users see only data scoped to their tenant after login.
- Try to access another tenant's VK ID via URL? IDOR protection returns 404 — it doesn't leak existence.
- Only Admin users (Console side) can see aggregate data across tenants.
Protocol adapters
SACTL does bidirectional translation between OpenAI Chat Completions and Anthropic Messages.
Typical scenarios
- Your customer already has an OpenAI SDK codebase and just wants to flip the base URL to use Claude (the initial main usage).
- Reverse direction (Anthropic SDK → GPT upstream) is coming once the GPT pool is wired up.
How to enable
# gateway-sidecar env SIDECAR_TRANSLATE_OPENAI_MODE=openai-to-anthropic
Clients call POST /v1/chat/completions; SACTL automatically:
- Map OpenAI roles in
messages(system/user/assistant/tool) to Anthropic's system block + message structure. tools: [{type:"function", function:{...}}]→ Anthropictools: [{name, input_schema}]。tool_calls[]↔tool_use— all IDs are bidirectionally mapped and persisted.image_urlblocks → Anthropicimageblocks.reasoning_effort→ Anthropicthinking.budget_tokens。
The response is translated back to the OpenAI shape; clients see nothing unusual. Send X-SACTL-Translated: 1 if you want the response to be explicitly tagged as translated (helps with log triage).
Usage cap protection
Every sk-xa-* key can carry a monthly budget cap. We pre-debit before forwarding the request to Claude — if the call would exceed the cap, you get 402 right away and we never spend on the upstream. Even if a key is leaked and someone hammers it with huge max_tokens, the loss is capped to your number.
Pre-debit mechanism
- The request enters the sidecar, which parses out
model+max_tokens. - The pricing registry is consulted for that model's
output_per_mtok. - Compute the upper-bound estimate:
estimated_cost = max_tokens × price_per_token × safety_multiplier. - Atomic Redis Lua SUB on both
tenant:{id}:budgetandvk:{id}:budgetat the same time. - If either dimension is short → 402 budget_exhausted; the request is never forwarded upstream.
- Once the upstream responds, compute real cost from
usageand atomic ADD the difference back. - If it's a 429/4xx (the call never happened or produced no usage), refund the entire pre-debit.
max_tokens to the value you actually need — don't pad to 32k as insurance. Padding holds budget reserve and can 402 concurrent requests on the same VK. The refund happens at response time, not request time.
Streaming (SSE)
Add stream: true to the request body to enable streaming. Responses come back as Content-Type: text/event-stream.
Anthropic native endpoint
Event structure follows the Anthropic official spec:
event: message_start
data: {"type":"message_start","message":{"id":"msg_...",...}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: message_stop
data: {"type":"message_stop"}OpenAI-compatible endpoint
SACTL rewrites the upstream Anthropic SSE into OpenAI chunk format:
data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},"index":0}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{},"finish_reason":"stop","index":0}]}
data: [DONE]Python example
import requests with requests.post( "https://api.sactl.ai/v1/messages", headers={"x-api-key": "sk-xa-prod-...", "anthropic-version": "2023-06-01"}, json={"model": "claude-sonnet-4-6", "max_tokens": 512, "stream": True, "messages": [{"role": "user", "content": "写首五言绝句"}]}, stream=True, ) as r: for line in r.iter_lines(): if line.startswith(b"data: "): print(line[6:].decode())
Tool Use / Function Calling
Both protocols supported. SACTL internally maps tool_use ↔ tool_calls IDs so client SDKs need zero changes.
Anthropic native
{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"tools": [
{
"name": "get_weather",
"description": "查询城市天气",
"input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
],
"messages": [{"role": "user", "content": "北京今天天气?"}]
}The response content array contains a {"type":"tool_use","id":"toolu_...","name":"get_weather","input":{"city":"Beijing"}} block.
OpenAI-compatible
{
"model": "claude-sonnet-4-6",
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "查询城市天气",
"parameters": {"type":"object","properties":{"city":{"type":"string"}}}
}
}
],
"messages": [{"role": "user", "content": "北京今天天气?"}]
}The response shows choices[0].message.tool_calls[0] in OpenAI shape with id like call_xxx. SACTL maintains a bidirectional call_xxx ↔ toolu_xxx mapping internally.
Complete OpenAI SDK example
from openai import OpenAI client = OpenAI( api_key="sk-xa-prod-...", base_url="https://api.sactl.ai/v1", ) resp = client.chat.completions.create( model="claude-sonnet-4-6", tools=[{ "type": "function", "function": { "name": "get_weather", "parameters": { "type": "object", "properties": {"city": {"type": "string"}}, }, }, }], messages=[{"role": "user", "content": "北京今天天气?"}], ) print(resp.choices[0].message.tool_calls)
Multimodal (images)
Both formats are supported. SACTL translates each to the Anthropic standard image block before forwarding upstream.
Anthropic format
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": "iVBORw0KGgo..."
}
},
{"type": "text", "text": "描述这张图"}
]
}Also supports source.type: "url" to reference a remote URL directly.
OpenAI format
{
"role": "user",
"content": [
{"type": "text", "text": "描述这张图"},
{"type": "image_url", "image_url": {"url": "https://example.com/pic.jpg"}}
]
}10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), link-local (169.254.0.0/16, including AWS/GCP instance metadata), loopback (127.0.0.0/8, ::1), and multicast ranges are all blocked with 400 ssrf_forbidden. The resolved IP after DNS is re-checked. HTTP 3xx redirects are not followed.
Extended Thinking
Claude 4.x reasoning traces are passed through verbatim by SACTL and counted in billing.
Anthropic native
{
"model": "claude-opus-4-6",
"max_tokens": 8000,
"thinking": {"type": "enabled", "budget_tokens": 32000},
"messages": [{"role": "user", "content": "证明 √2 无理"}]
}OpenAI-compatible
For OpenAI requests, use reasoning_effort; SACTL maps it automatically:
| reasoning_effort | maps to budget_tokens |
|---|---|
"low" | 2048 |
"medium" | 8000 |
"high" | 32000 |
Beta header
SACTL auto-injects anthropic-beta: 2025-02-07 if the client doesn't send it. No action required.
Billing
Thinking tokens are billed at the output rate and counted in usage.completion_tokens; only the token count is logged, never the thinking content. The Inspector exposes a thinking_tokens field for reconciliation.
Prompt Caching
Mark Claude's system / tools / large-context blocks with cache markers; cache reads cost (30% of official) × 10% — well below Anthropic's official cache pricing.
Client-side explicit declaration
{
"model": "claude-sonnet-4-6",
"system": [
{
"type": "text",
"text": "你是一个产品客服助手,用户的问题都按下面的 FAQ 回答...",
"cache_control": {"type": "ephemeral"}
}
],
"messages": [...]
}Auto-injection (A1 batch)
Set the env vars below and SACTL will auto-add cache markers to high-repetition blocks like system / tools arrays:
SIDECAR_AUTO_CACHE_ENABLED=true SIDECAR_AUTO_CACHE_MIN_BYTES=1024 SIDECAR_AUTO_CACHE_TTL=300
Opt-out
Set Cache-Control: no-transform on the request and SACTL skips auto-injection.
See Pricing · Model table for rates (the Cache Write/Read columns already reflect × 0.30). Claude cache write / read is billed at × 0.30 across the board on SACTL.
Messages Batches
Mirrors Anthropic's Messages Batches API for async batch jobs (up to 10,000 requests).
Submit
curl -X POST https://api.sactl.ai/v1/messages/batches \ -H "x-api-key: sk-xa-prod-..." \ -H "Content-Type: application/json" \ -d '{ "requests": [ {"custom_id": "r-001", "params": {"model":"claude-sonnet-4-6","max_tokens":256,"messages":[{"role":"user","content":"Hi 1"}]}}, {"custom_id": "r-002", "params": {"model":"claude-sonnet-4-6","max_tokens":256,"messages":[{"role":"user","content":"Hi 2"}]}} ] }'
State machine
validates→in_progress→endedended.result:completed|canceled|expired|failed
Polling / result download
# 轮询状态 curl -H "x-api-key: sk-xa-prod-..." \ https://api.sactl.ai/v1/messages/batches/msgbatch_abc # 下载 results JSONL curl -H "x-api-key: sk-xa-prod-..." \ https://api.sactl.ai/v1/messages/batches/msgbatch_abc/results
Pricing
Anthropic's official batch price = standard × 0.5. SACTL stacks our × 0.30 on top:
SACTL batch price = official standard × 0.30 × 0.5 = official × 0.15
Files API
Upload documents / images and reference them by file_id in subsequent /v1/messages requests.
Upload
curl -X POST https://api.sactl.ai/v1/files \ -H "x-api-key: sk-xa-prod-..." \ -F "[email protected]" \ -F "purpose=messages"
Limits
- Per-file size cap:
SIDECAR_FILES_MAX_MB, default 32 MB. - Allowed MIME types:
application/pdf,image/png|jpeg|webp,text/plain,text/markdown. - Upload bodies are magic-byte scanned; mismatched content-type vs. actual file is rejected.
Audit
Every upload / delete / read writes a billing log (file.uploaded, file.deleted, file.accessed) for reconciliation. File bodies live in a dedicated bucket and are auto-purged at end of lifecycle.
References
{
"role": "user",
"content": [
{"type": "document", "source": {"type": "file", "file_id": "file_abc123"}},
{"type": "text", "text": "给这份报告写 200 字摘要"}
]
}Model list
Returns the models the current VK has access to (allowed_models ∩ pricing registry), dynamically.
curl -H "x-api-key: sk-xa-prod-..." \ https://api.sactl.ai/v1/models
Response (Anthropic shape)
{
"data": [
{"id": "claude-opus-4-6", "type": "model", "display_name": "Claude Opus 4.6"},
{"id": "claude-sonnet-4-6", "type": "model", "display_name": "Claude Sonnet 4.6"},
{"id": "claude-haiku-4-5", "type": "model", "display_name": "Claude Haiku 4.5"}
],
"has_more": false
}OpenAI clients hitting /v1/models receive the OpenAI shape ({object:"list", data:[...]}).
Rate-limit dimensions
Four-dimensional GCRA (Generic Cell Rate Algorithm, atomic Redis Lua implementation):
- Tenant aggregate
rl:tenant:{tenant_id}— capacity shared by all VKs - Single VK
rl:vk:{vk_id}— primary client-facing limit - VK × model
rl:vkm:{vk_id}:{model}— prevents hot models from monopolising capacity - VK × IP
rl:vki:{vk_id}:{ip}— credential stuffing / bot defence
Any dimension rejecting → short-circuit 429 with Retry-After: {s}. We never hit the upstream and never pre-debit budget.
Parameter sources
- VK dimension:
virtual_keys.rps/burst(editable at portal creation time) - Tenant dimension:
tenants.policy.rps(admin console) - Model dimension:
model_policy.{model}.max_rps(pricing registry, per-model) - IP dimension: default 5 RPS / burst 20, adjustable in global config
Response headers
HTTP/2 429 retry-after: 3 x-ratelimit-limit-requests: 5000 x-ratelimit-remaining-requests: 0 x-ratelimit-reset-requests: 1713600000 x-sactl-rl-dim: vk {"error":{"code":"rate_limited","message":"rate limit exceeded","trace_id":"01JQX..."}}
Error code list
All errors share the same shape: {error:{code, message, trace_id}}. Full response spec at API Reference · Errors.
| HTTP | code | Description |
|---|---|---|
| 400 | invalid_request | Malformed body / wrong field; message describes the schema violation |
| 400 | ssrf_forbidden | Multimodal URL hit the SSRF blocklist (private / link-local / loopback) |
| 400 | content_blocked_by_policy | PII policy with action: block matched |
| 401 | key_invalid | VK does not exist, was revoked, or expired |
| 402 | budget_exhausted | Tenant or VK budget too low for pre-debit |
| 403 | model_not_allowed | The VK's allowed_models doesn't include this model |
| 403 | ip_not_allowed | VK has an ip_whitelist and the source IP isn't on it |
| 413 | payload_too_large | body > SIDECAR_MAX_BODY_MB or file > SIDECAR_FILES_MAX_MB |
| 415 | unsupported_media_type | Files API upload MIME not in allowlist |
| 429 | rate_limited | GCRA tripped on at least one dimension; carries Retry-After |
| 499 | client_closed_request | Client disconnected early (upstream already started, usage still billed) |
| 500 | internal | Internal sidecar error, already paging |
| 502 | upstream_error | Upstream returned an unexpected response, sanitised |
| 503 | upstream_unavailable | All available provider keys are circuit-broken |
| 504 | upstream_timeout | Upstream exceeded SIDECAR_UPSTREAM_TIMEOUT (default 120s) |
Retry strategy
Recommended client implementation:
| Error code | Retryable | Strategy |
|---|---|---|
upstream_error (502) | Yes | Exponential backoff 2s / 4s / 8s, up to 3 attempts |
upstream_timeout (504) | Yes | Same — consider switching model on the second attempt |
upstream_unavailable (503) | Yes | All providers circuit-broken; wait 30s+ |
rate_limited (429) | Yes | Strictly back off per Retry-After |
budget_exhausted (402) | No | Top up first; then retry |
key_invalid (401) | No | Issue a fresh VK to the user |
model_not_allowed (403) | No | Change the model or update VK config |
ssrf_forbidden (400) | No | Use a publicly reachable URL |
Data handling
SACTL is a pass-through service. We don't store content. Specifically:
Things we don't store
- Raw request prompts: what you send to Claude is never persisted. Logs only carry trace_id, token counts, latency, and the model name.
- Raw response content: whatever Claude returns is forwarded to you without making a copy. We don't cache (unless you explicitly enable prompt cache) and we don't train on your data.
- Upstream API keys: how we authenticate to Claude is a backend concern. Those keys never appear in responses and never flow back to you.
Things we do store (for billing and rate-limiting)
- Per-request trace_id, model, token counts, latency, and amount spent
- Your top-up and spend balance ledger
- HMAC digest of your sk-xa-* key (never plaintext — even with the database, the key can't be reversed)
Usage cap protection
Every sk-xa-* key can carry a monthly budget cap. Hitting it returns 402 immediately so a leaked key can't drain the balance.
# 在 Telegram 找客服设上限,或后台一键设置 key: sk-xa-prod-*** monthly_budget_usd: 100 # 这把 key 月最多花 $100 allowed_models: ["claude-sonnet-4-6", "claude-haiku-4-5"]
Multimodal URL filtering
When you send an image_url to Claude, we reject URLs pointing at private networks / loopback / metadata services (10.0.0.0/8, 127.0.0.0/8, 169.254.169.254, etc.) to prevent SSRF abuse. Backend policy — public-facing image hosts aren't affected.
Refunds
Balance is refundable on request at any time — message Telegram support. We deduct what you've used and return the rest to the original payment channel (WeChat / Alipay / USDT / wire). If we ever shut down, we give 30 days notice and refund all balance pro-rata.
OAuth / SSO login
The user management portal supports OAuth login — no separate signup. Currently available:
- GitHub
- OIDC (Azure AD / Okta / Auth0 / Keycloak) — for enterprise SSO integration
Flow
- Browser →
GET /api/v1/auth/oauth/{provider}/start - Redirect to the Provider's authorization page
- After consent, Provider calls
/callback; we exchange for an internal JWT and set an httponly cookie - Subsequent backend requests authenticate via that cookie
OAuth client_secret is stored encrypted — not in env vars, not in plaintext DB. Enterprise SSO is provisioned via Telegram support; you'll need to provide your OIDC metadata URL.
OpenAI SDK
No client code changes — just swap api_key and base_url.
from openai import OpenAI client = OpenAI( api_key="sk-xa-prod-xxxxx", base_url="https://api.sactl.ai/v1", ) resp = client.chat.completions.create( model="claude-sonnet-4-6", messages=[{"role": "user", "content": "Hello"}], ) print(resp.choices[0].message.content)
import OpenAI from "openai"; const client = new OpenAI({ apiKey: "sk-xa-prod-xxxxx", baseURL: "https://api.sactl.ai/v1", }); const resp = await client.chat.completions.create({ model: "claude-sonnet-4-6", messages: [{ role: "user", content: "Hello" }], }); console.log(resp.choices[0].message.content);
from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="claude-sonnet-4-6", api_key="sk-xa-prod-xxxxx", base_url="https://api.sactl.ai/v1", ) from langchain_core.messages import HumanMessage resp = llm.invoke([HumanMessage(content="Hello")]) print(resp.content)
Or via env vars: set OPENAI_API_KEY=sk-xa-prod-... and OPENAI_BASE_URL=https://api.sactl.ai/v1; the SDK picks them up automatically.
Anthropic SDK
Python
import anthropic client = anthropic.Anthropic( api_key="sk-xa-prod-xxxxx", base_url="https://api.sactl.ai", ) resp = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, messages=[{"role": "user", "content": "Hello"}], ) print(resp.content[0].text)
Node.js
import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic({ apiKey: "sk-xa-prod-xxxxx", baseURL: "https://api.sactl.ai", }); const resp = await client.messages.create({ model: "claude-sonnet-4-6", max_tokens: 1024, messages: [{ role: "user", content: "Hello" }], }); console.log(resp.content[0].text);
The Anthropic SDK does not require a /v1 suffix — it appends one itself. The OpenAI SDK requires it. That's a historical difference between the two SDKs, not a SACTL issue.
curl
Minimum-viable example. When debugging an SDK issue, hit it with curl first to validate the network path.
curl -X POST https://api.sactl.ai/v1/messages \ -H "x-api-key: sk-xa-prod-xxxxx" \ -H "anthropic-version: 2023-06-01" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "max_tokens": 512, "messages": [ {"role": "user", "content": "Hello"} ] }' \ -w "\n---\nHTTP %{http_code} time %{time_total}s\n"
OpenAI-compatible endpoint:
curl -X POST https://api.sactl.ai/v1/chat/completions \ -H "Authorization: Bearer sk-xa-prod-xxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "messages": [{"role":"user","content":"Hello"}] }'
Dashboard / Inspector
Console UI
Admin Console at https://console.sactl.ai — built for your SRE and commercial teams:
- QPS / cost / latency p50 p95 p99 charts (Recharts)
- Slice by model / tenant / VK / time range
- Provider key pool health (circuit-break counts, RPM remaining)
- Per-tenant monthly billing CSV export
Inspector
Replay request metadata by trace_id:
- Timestamp, tenant_id, VK prefix (
sk-xa-prod-7xK9..., last 6 chars masked) - model、prompt_tokens / completion_tokens / thinking_tokens、cost_usd、duration_ms
- Rate-limit dimensions hit / pre-debit / refund details
- Upstream provider name + matched key ID prefix
Prometheus Metrics
Every Go service exposes /metrics; labels are constrained by an allowlist — no high-cardinality labels like user_id.
Important series
| Metric | Purpose |
|---|---|
sidecar_forward_duration_seconds{quantile} | Hot-path latency, SLO p99 ≤ 2ms |
sidecar_rl_rejects_total{dim=tenant|vk|vkm|vki} | Rate-limit rejections bucketed by dimension |
sidecar_upstream_errors_total{provider,code} | Attribution for upstream 5xx / timeouts |
breaker_state{provider} | Circuit-breaker state (0 closed / 1 half / 2 open) |
budget_preauth_total{result} | Pre-debit success / failure count |
ssrf_block_total | SSRF filter blocks |
Grafana dashboards
The repo's deploy/grafana/dashboards/ ships five ready-to-use boards:
- SACTL Hot Path SLO — latency p50/p95/p99 + error-budget burn rate
- Rate Limit Heatmap — per-dimension rejections bucketed by hour
- Upstream Health — per-provider 5xx rate + circuit-break event stream
- Cost Tracking — per-tenant/model running monthly total
- Audit Event Stream — real-time security event stream
Recommended Prometheus scrape interval: 10s; retain 30 days raw + 90 days at 5-minute down-sampling.
Log field allowlist
Structured logs may only persist the following fields:
timestamp, level, trace_id, tenant_id, key_id_prefix, model, status, duration_ms, tokens_prompt, tokens_completion, cost_usd
Forbidden fields
- Raw
prompt/responsecontent - Plaintext
api_key(VK or provider) - Raw upstream
error.message(may carry provider-internal stacks) - Raw Authorization headers
- Any PII (email / national ID / credit card / phone)
internal/redact/logging.go wrapper. Forbidden fields are dropped at the code level, and dev mode panics on violation. CI runs gosec + staticcheck rules that scan log.Print, fmt.Print, and direct slog keys against the allowlist. Any violation fails CI — nothing reaches main.
Production logs flow stdout → container runtime → FluentBit → OpenSearch and are retained for 90 days. Billing-related data is stored separately on a different pipeline so reconciliation queries don't tangle with general logs.