Files
nroxy/docs/ops/provider-key-pool.md
2026-03-10 14:03:52 +03:00

2.4 KiB

Provider Key Pool

Purpose

Route generation traffic through multiple provider API keys while hiding transient failures from end users.

Key selection

  • Only keys in active state are eligible for first-pass routing.
  • Requests start from the next active key by round robin.
  • A single request must not attempt the same key twice.

Optional proxy behavior

  • A key may have one optional proxy attached.
  • If a proxy exists, the first attempt uses the proxy.
  • If the proxy path fails with a transport error, retry the same key directly.
  • Direct fallback does not bypass other business checks.
  • Current runtime policy reads cooldown and manual-review thresholds from environment:
    • KEY_COOLDOWN_MINUTES
    • KEY_FAILURES_BEFORE_MANUAL_REVIEW

Retry rules

Retry on the next key only for:

  • network errors
  • connection failures
  • timeouts
  • provider 5xx

Do not retry on the next key for:

  • validation errors
  • unsupported inputs
  • policy rejections
  • other user-caused provider 4xx

States

  • active
  • cooldown
  • out_of_funds
  • manual_review
  • disabled

Transitions

  • active -> cooldown on retryable failures
  • cooldown -> active after successful automatic recheck
  • cooldown -> manual_review after more than 10 consecutive retryable failures across recovery cycles
  • active|cooldown -> out_of_funds on confirmed insufficient funds
  • out_of_funds -> active only by manual admin action
  • manual_review -> active only by manual admin action
  • active -> disabled by manual admin action

Current runtime note

  • The current worker implementation already applies proxy-first then direct fallback within one provider-key attempt.
  • The current worker implementation writes GenerationAttempt.usedProxy and GenerationAttempt.directFallbackUsed for auditability.
  • The current worker implementation also runs a background cooldown-recovery sweep and returns keys to active after cooldownUntil passes.

Balance tracking

  • Primary source of truth is the provider balance API.
  • Balance refresh runs periodically and also after relevant failures.
  • Telegram admin output must show per-key balance snapshots and the count of keys in out_of_funds.

Admin expectations

Web admin and Telegram admin must both be able to:

  • inspect key state
  • inspect last error category and code
  • inspect balance snapshot and refresh time
  • enable or disable a key
  • return a key from manual_review
  • return a key from out_of_funds
  • add a new key