Predictable GPU inference spend — without surprise bills.
Darktree provides OpenAI-compatible inference endpoints backed by prepaid compute credits.
Credits are consumed by token usage (prompt_tokens + completion_tokens) returned in API responses.
Add daily caps for budget control and an append-only audit ledger for reconciliation.
Prepaid bundles fund token-metered inference. Each plan includes a conservative daily cap to prevent runaway spend.
Caps reset daily and can be raised or lowered on request.
Token totals assume $0.75 per 1,000 tokens. Daily caps reset at 00:00 UTC and act as a hard stop
(requests return HTTP 429 when exceeded) until the next reset. Caps are set per customer by Darktree and can be adjusted on request.
Latency note: Premium models are typically higher‑latency than Standard models.
In steady state, qwen25-14b-awq is commonly ~100 tokens/sec and qwen25-32b-awq ~45 tokens/sec
(typical medians; depends on prompt length, max_tokens, concurrency, and warm vs cold starts).
Note: credits are prepaid and non‑refundable. Usage is measured in tokens, not time.
Token usage returned by API responses is the authoritative record for credit deduction.
Quickstart
Copy the snippet below, replace <YOUR_API_KEY> with the key you receive after purchasing a bundle, and start calling the OpenAI‑compatible endpoint.