# Provider Key Pool ## Purpose Route generation traffic through multiple provider API keys while hiding transient failures from end users. ## Key selection - Only keys in `active` state are eligible for first-pass routing. - Requests start from the next active key by round robin. - A single request must not attempt the same key twice. ## Optional proxy behavior - A key may have one optional proxy attached. - If a proxy exists, the first attempt uses the proxy. - If the proxy path fails with a transport error, retry the same key directly. - Direct fallback does not bypass other business checks. - Current runtime policy reads cooldown and manual-review thresholds from environment: - `KEY_COOLDOWN_MINUTES` - `KEY_FAILURES_BEFORE_MANUAL_REVIEW` ## Retry rules Retry on the next key only for: - network errors - connection failures - timeouts - provider `5xx` Do not retry on the next key for: - validation errors - unsupported inputs - policy rejections - other user-caused provider `4xx` ## States - `active` - `cooldown` - `out_of_funds` - `manual_review` - `disabled` ## Transitions - `active -> cooldown` on retryable failures - `cooldown -> active` after successful automatic recheck - `cooldown -> manual_review` after more than 10 consecutive retryable failures across recovery cycles - `active|cooldown -> out_of_funds` on confirmed insufficient funds - `out_of_funds -> active` only by manual admin action - `manual_review -> active` only by manual admin action - `active -> disabled` by manual admin action ## Current runtime note - The current worker implementation already applies proxy-first then direct fallback within one provider-key attempt. - The current worker implementation writes `GenerationAttempt.usedProxy` and `GenerationAttempt.directFallbackUsed` for auditability. - The current worker implementation also runs a background cooldown-recovery sweep and returns keys to `active` after `cooldownUntil` passes. ## Balance tracking - Primary source of truth is the provider balance API. - Balance refresh runs periodically and also after relevant failures. - Telegram admin output must show per-key balance snapshots and the count of keys in `out_of_funds`. ## Admin expectations Web admin and Telegram admin must both be able to: - inspect key state - inspect last error category and code - inspect balance snapshot and refresh time - enable or disable a key - return a key from `manual_review` - return a key from `out_of_funds` - add a new key