Usage and quotas

Which models you can use, your rolling 30-day usage allowance for self-service LLM keys, the 429 you get over the cap, and optional hep_tokens overage.

Usage and quotas

Self-service LLM keys are metered against a rolling dollar-value allowance. This page explains which models you can use, how much usage is included, and what happens when you run past it.

Models and plan access

Models
qwen3-8b
slugoptional
Generally available. Anyone with AI access can mint a key for it.
cf-gpt-oss-20b
slugoptional
Admin only. Minting a key for it on a non-admin account returns 403.

GET https://ai.hep.gg/models lists the slugs you can mint a key for. If you pick a model that is not available on your plan, key creation returns 403.

Your monthly allowance

Usage is valued in dollars at each model's rate and charged against a rolling 30-day allowance for your whole account (every key you own counts against the same pool):

PlanIncluded usage (rolling 30 days)
Free$10
Premium$50
AdminUnlimited

"Rolling" means the window is always the last 30 days, so old usage ages off continuously and your headroom refills gradually. It is not a hard reset on a fixed day of the month. The qwen3-8b rate is $0.06 per 1M input tokens and $0.24 per 1M output tokens, metered separately. Your current spend and remaining allowance are shown on the LLM Keys page.

When you hit the cap

Once your rolling usage reaches your allowance, POST /v1/chat/completions returns 429 in the OpenAI error shape with type: "quota_exceeded":

429 Too Many Requests
{
  "error": {
    "message": "Monthly usage limit reached ($10). It refills as your last-30-day usage ages off, or enable extra usage (hep_tokens) in your dashboard.",
    "type": "quota_exceeded",
    "code": "quota_exhausted"
  }
}

The code tells you why:

quota_exceeded codes
quota_exhausted
codeoptional
Your included allowance is used up. Wait for the 30-day window to roll, or turn on extra usage.
overage_cap
codeoptional
Extra usage is on, but you reached the hep_tokens spend cap you set for this period. Raise it on the dashboard.
no_hep_tokens
codeoptional
Extra usage is on, but your hep_tokens balance is 0. Top up to continue.

Extra usage (hep_tokens overage)

Extra usage is off by default. Turn it on from the LLM Keys page to keep working past your included allowance. Beyond the allowance, each request is billed to your hep_tokens balance at the standard list rate of 240 hep_tokens per $1 of usage.

  • You can set an optional cap (the most hep_tokens to auto-spend per rolling 30 days). 0 means no cap, up to your balance.
  • When your balance reaches 0 or you hit your cap, requests return 429 again (no_hep_tokens or overage_cap).

Admins

Admin accounts have no quota, can use every model, and keep the existing admin key path. None of the limits above apply to them.