OpenAI-compatible API

Chat completions and model listing at https://ai.hep.gg/v1, authenticated with an sk-hyd- API key. Drop-in for any OpenAI SDK.

Read via MCP

OpenAI-compatible API

https://ai.hep.gg/v1 speaks the OpenAI Chat Completions wire format. Point any OpenAI SDK at it by overriding the base URL and passing your sk-hyd- key as the API key. No custom client is needed.

Authentication

Send your API key as a Bearer token.

Authorization header

Authorization: Bearer sk-hyd-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The key is SHA-256 hashed and matched against your active keys. A missing key, a value that does not start with sk-hyd-, or a disabled key all return 401:

{ "error": "Invalid API key" }

Mint keys with your master token via POST /keys, or from the dashboard. Each key is pinned to one model at mint time.

Model selection

Each sk-hyd- key is bound to exactly one model (its mint-time slug). The endpoints route to that model regardless of what you put in the request model field, so a model value in the body is effectively ignored for routing. Pass the slug anyway for SDK compatibility, the response model is rewritten to the hep.gg slug (for example cf-gpt-oss-20b).

Two models exist: qwen3-8b (Qwen3 8B), generally available to anyone with AI access, and cf-gpt-oss-20b (GPT-OSS 20B on Cloudflare), admin only. List the slugs you can mint a key for at GET https://ai.hep.gg/models. See Usage and quotas for plan access and the per-account allowance.

POSThttps://ai.hep.gg/v1/chat/completionsAuth required

OpenAI-compatible chat completion. Supports streaming.

Accepts a standard OpenAI Chat Completions JSON body and returns an OpenAI-shaped completion. Content-Type: application/json, body limit 10 MB.

Body fields

messages

arrayrequired

The conversation as an array of { role, content } objects, exactly as OpenAI expects.

model

stringoptional

The model slug. Accepted for SDK compatibility but does not change routing, the key's pinned model is always used. Pass your key's slug (for example qwen3-8b).

stream

booleanoptionaldefault: false

When true, the response is streamed as Server-Sent Events (text/event-stream).

max_tokens

integeroptional

Maximum tokens to generate. If omitted, the gateway injects 16384 upstream (see below). Any value you pass is honored as-is.

temperature

numberoptional

Standard OpenAI sampling parameter. Other standard fields are forwarded upstream unchanged.

Response

Non-streaming returns the upstream completion JSON with the model field rewritten to the hep.gg slug and a usage object carrying prompt_tokens and completion_tokens.

200 OK (non-streaming)

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "cf-gpt-oss-20b",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello" },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 12, "completion_tokens": 1 }
}

Streaming pipes SSE chunks through with the model field rewritten in each data: line; the final chunk carries usage. Every request is logged and your key's request_count, prompt_tokens, completion_tokens, and last_used_at counters are updated.

Examples

curl

curl https://ai.hep.gg/v1/chat/completions \
  -H "Authorization: Bearer $HYD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cf-gpt-oss-20b",
    "messages": [
      { "role": "system", "content": "You are concise." },
      { "role": "user", "content": "Name three primary colors." }
    ],
    "max_tokens": 512
  }'

node

import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://ai.hep.gg/v1",
  apiKey: process.env.HYD_API_KEY, // sk-hyd-...
});
 
const res = await client.chat.completions.create({
  model: "cf-gpt-oss-20b",
  messages: [
    { role: "system", content: "You are concise." },
    { role: "user", content: "Name three primary colors." },
  ],
  max_tokens: 512,
});
 
console.log(res.choices[0].message.content);
console.log(res.usage);

Streaming with the SDK:

curl

curl https://ai.hep.gg/v1/chat/completions \
  -H "Authorization: Bearer $HYD_API_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "cf-gpt-oss-20b",
    "messages": [{ "role": "user", "content": "Count to five." }],
    "max_tokens": 512,
    "stream": true
  }'

node

const stream = await client.chat.completions.create({
  model: "cf-gpt-oss-20b",
  messages: [{ role: "user", content: "Count to five." }],
  max_tokens: 512,
  stream: true,
});
 
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

GEThttps://ai.hep.gg/v1/modelsAuth required

List the single model your API key is pinned to.

OpenAI-compatible model list. Because an OpenAI client pins to one model, this returns only the model your presented key is bound to. Same authentication as chat completions.

200 OK

{
  "object": "list",
  "data": [
    { "id": "cf-gpt-oss-20b", "object": "model", "owned_by": "team-hydra" }
  ]
}

curl

curl https://ai.hep.gg/v1/models \
  -H "Authorization: Bearer $HYD_API_KEY"

Usage limits

Requests count against your account's rolling 30-day allowance. Once you pass it, this endpoint returns 429 with type: "quota_exceeded" (unless you have enabled hep_tokens extra usage). The code field says why. See Usage and quotas for the allowances, the error codes, and how to keep working past the cap.

Errors

Errors use the OpenAI shape, { "error": { "message": "..." } }, with the upstream status code (or 400, 429, 500, 501 for local conditions). A 429 with type: "quota_exceeded" means you reached your usage allowance.

OpenAI-compatible API

OpenAI-compatible API#

Authentication#

Model selection#

Response#

Examples#

Usage limits#

Errors#

OpenAI-compatible API

Authentication

Model selection

Response

Examples

Usage limits

Errors