Usage and quotas

Which models you can use, your rolling 30-day usage allowance for self-service LLM keys, the 429 you get over the cap, and optional hep_tokens overage.

Read via MCP

Usage and quotas

Self-service LLM keys are metered against a rolling dollar-value allowance. This page explains which models you can use, how much usage is included, and what happens when you run past it.

Models and plan access

Models

qwen3-8b

slugoptional

Generally available. Anyone with AI access can mint a key for it.

cf-gpt-oss-20b

slugoptional

Admin only. Minting a key for it on a non-admin account returns 403.

GET https://ai.hep.gg/models lists the slugs you can mint a key for. If you pick a model that is not available on your plan, key creation returns 403.

Your monthly allowance

Usage is valued in dollars at each model's rate and charged against a single allowance for your whole account (every key you own counts against the same pool):

Plan	Included usage	How it refills
Free	$10	Rolling: always the last 30 days
Premium	$50	Resets to full on each Hep.gg Prime renewal
Admin	Unlimited	n/a

On Free, the window is always the last 30 days, so old usage ages off continuously and your headroom refills gradually rather than on a fixed day. On Hep.gg Prime, your allowance refreshes to the full $50 on each Prime renewal (about every 30 days), so every cycle starts clean. The qwen3-8b rate is $0.06 per 1M input tokens and $0.24 per 1M output tokens, metered separately. Your current spend and remaining allowance are shown on the LLM Keys page.

When you hit the cap

Once your rolling usage reaches your allowance, POST /v1/chat/completions returns 429 in the OpenAI error shape with type: "quota_exceeded":

429 Too Many Requests

{
  "error": {
    "message": "Monthly usage limit reached ($10). It refills as your last-30-day usage ages off, or enable extra usage (hep_tokens) in your dashboard.",
    "type": "quota_exceeded",
    "code": "quota_exhausted"
  }
}

The code tells you why:

quota_exceeded codes

quota_exhausted

codeoptional

Your included allowance is used up. On Free, wait for the 30-day window to roll; on Hep.gg Prime, it refreshes on your next renewal. Or turn on extra usage.

overage_cap

codeoptional

Extra usage is on, but you reached the hep_tokens spend cap you set for this period. Raise it on the dashboard.

no_hep_tokens

codeoptional

Extra usage is on, but your hep_tokens balance is 0. Top up to continue.

Extra usage (hep_tokens overage)

Extra usage is off by default. Turn it on from the LLM Keys page to keep working past your included allowance. Beyond the allowance, each request is billed to your hep_tokens balance at the standard list rate of 240 hep_tokens per $1 of usage.

You can set an optional cap (the most hep_tokens to auto-spend per rolling 30 days). 0 means no cap, up to your balance.
When your balance reaches 0 or you hit your cap, requests return 429 again (no_hep_tokens or overage_cap).

Admins

Admin accounts have no quota, can use every model, and keep the existing admin key path. None of the limits above apply to them.

Usage and quotas

Usage and quotas#

Models and plan access#

Your monthly allowance#

When you hit the cap#

Extra usage (hep_tokens overage)#

Admins#

Usage and quotas

Models and plan access

Your monthly allowance

When you hit the cap

Extra usage (hep_tokens overage)

Admins