OpenAI-compatible API
Chat completions and model listing at https://ai.hep.gg/v1, authenticated with an sk-hyd- API key. Drop-in for any OpenAI SDK.
OpenAI-compatible API
https://ai.hep.gg/v1 speaks the OpenAI Chat Completions wire format. Point any OpenAI SDK at it by overriding the base URL and passing your sk-hyd- key as the API key. No custom client is needed.
Authentication
Send your API key as a Bearer token.
Authorization: Bearer sk-hyd-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxThe key is SHA-256 hashed and matched against your active keys. A missing key, a value that does not start with sk-hyd-, or a disabled key all return 401:
{ "error": "Invalid API key" }Mint keys with your master token via POST /keys, or from the dashboard. Each key is pinned to one model at mint time.
Model selection
Each sk-hyd- key is bound to exactly one model (its mint-time slug). The endpoints route to that model regardless of what you put in the request model field, so a model value in the body is effectively ignored for routing. Pass the slug anyway for SDK compatibility, the response model is rewritten to the hep.gg slug (for example cf-gpt-oss-20b).
Two models exist: qwen3-8b (Qwen3 8B), generally available to anyone with AI access, and cf-gpt-oss-20b (GPT-OSS 20B on Cloudflare), admin only. List the slugs you can mint a key for at GET https://ai.hep.gg/models. See Usage and quotas for plan access and the per-account allowance.
https://ai.hep.gg/v1/chat/completionsAuth requiredAccepts a standard OpenAI Chat Completions JSON body and returns an OpenAI-shaped completion. Content-Type: application/json, body limit 10 MB.
messages{ role, content } objects, exactly as OpenAI expects.modelqwen3-8b).streamfalsetrue, the response is streamed as Server-Sent Events (text/event-stream).max_tokens16384 upstream (see below). Any value you pass is honored as-is.temperatureResponse
Non-streaming returns the upstream completion JSON with the model field rewritten to the hep.gg slug and a usage object carrying prompt_tokens and completion_tokens.
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "cf-gpt-oss-20b",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello" },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 12, "completion_tokens": 1 }
}Streaming pipes SSE chunks through with the model field rewritten in each data: line; the final chunk carries usage. Every request is logged and your key's request_count, prompt_tokens, completion_tokens, and last_used_at counters are updated.
Examples
Streaming with the SDK:
https://ai.hep.gg/v1/modelsAuth requiredOpenAI-compatible model list. Because an OpenAI client pins to one model, this returns only the model your presented key is bound to. Same authentication as chat completions.
{
"object": "list",
"data": [
{ "id": "cf-gpt-oss-20b", "object": "model", "owned_by": "team-hydra" }
]
}Usage limits
Requests count against your account's rolling 30-day allowance. Once you pass it, this endpoint returns 429 with type: "quota_exceeded" (unless you have enabled hep_tokens extra usage). The code field says why. See Usage and quotas for the allowances, the error codes, and how to keep working past the cap.
Errors
Errors use the OpenAI shape, { "error": { "message": "..." } }, with the upstream status code (or 400, 429, 500, 501 for local conditions). A 429 with type: "quota_exceeded" means you reached your usage allowance.