Initial import
This commit is contained in:
67
docs/ops/provider-key-pool.md
Normal file
67
docs/ops/provider-key-pool.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Provider Key Pool
|
||||
|
||||
## Purpose
|
||||
Route generation traffic through multiple provider API keys while hiding transient failures from end users.
|
||||
|
||||
## Key selection
|
||||
- Only keys in `active` state are eligible for first-pass routing.
|
||||
- Requests start from the next active key by round robin.
|
||||
- A single request must not attempt the same key twice.
|
||||
|
||||
## Optional proxy behavior
|
||||
- A key may have one optional proxy attached.
|
||||
- If a proxy exists, the first attempt uses the proxy.
|
||||
- If the proxy path fails with a transport error, retry the same key directly.
|
||||
- Direct fallback does not bypass other business checks.
|
||||
- Current runtime policy reads cooldown and manual-review thresholds from environment:
|
||||
- `KEY_COOLDOWN_MINUTES`
|
||||
- `KEY_FAILURES_BEFORE_MANUAL_REVIEW`
|
||||
|
||||
## Retry rules
|
||||
Retry on the next key only for:
|
||||
- network errors
|
||||
- connection failures
|
||||
- timeouts
|
||||
- provider `5xx`
|
||||
|
||||
Do not retry on the next key for:
|
||||
- validation errors
|
||||
- unsupported inputs
|
||||
- policy rejections
|
||||
- other user-caused provider `4xx`
|
||||
|
||||
## States
|
||||
- `active`
|
||||
- `cooldown`
|
||||
- `out_of_funds`
|
||||
- `manual_review`
|
||||
- `disabled`
|
||||
|
||||
## Transitions
|
||||
- `active -> cooldown` on retryable failures
|
||||
- `cooldown -> active` after successful automatic recheck
|
||||
- `cooldown -> manual_review` after more than 10 consecutive retryable failures across recovery cycles
|
||||
- `active|cooldown -> out_of_funds` on confirmed insufficient funds
|
||||
- `out_of_funds -> active` only by manual admin action
|
||||
- `manual_review -> active` only by manual admin action
|
||||
- `active -> disabled` by manual admin action
|
||||
|
||||
## Current runtime note
|
||||
- The current worker implementation already applies proxy-first then direct fallback within one provider-key attempt.
|
||||
- The current worker implementation writes `GenerationAttempt.usedProxy` and `GenerationAttempt.directFallbackUsed` for auditability.
|
||||
- The current worker implementation also runs a background cooldown-recovery sweep and returns keys to `active` after `cooldownUntil` passes.
|
||||
|
||||
## Balance tracking
|
||||
- Primary source of truth is the provider balance API.
|
||||
- Balance refresh runs periodically and also after relevant failures.
|
||||
- Telegram admin output must show per-key balance snapshots and the count of keys in `out_of_funds`.
|
||||
|
||||
## Admin expectations
|
||||
Web admin and Telegram admin must both be able to:
|
||||
- inspect key state
|
||||
- inspect last error category and code
|
||||
- inspect balance snapshot and refresh time
|
||||
- enable or disable a key
|
||||
- return a key from `manual_review`
|
||||
- return a key from `out_of_funds`
|
||||
- add a new key
|
||||
Reference in New Issue
Block a user