Initial import

This commit is contained in:
sirily
2026-03-10 14:03:52 +03:00
commit 6c0ca4e28b
102 changed files with 6598 additions and 0 deletions

48
docs/ops/deployment.md Normal file
View File

@@ -0,0 +1,48 @@
# Deployment Plan
## Chosen target
Deploy on one VPS with Docker Compose.
## Why this target
- The system has multiple long-lived components: web, worker, bot, database, and reverse proxy.
- Compose gives predictable service boundaries, easier upgrades, and easier recovery than manually managed host processes.
- It keeps the path open for later separation of web, worker, and bot without reworking the repository layout.
## Expected services
- `migrate`: one-shot schema bootstrap job run before app services start
- `web`: Next.js app serving the site, dashboard, admin UI, and API routes
- `worker`: background job processor
- `bot`: Telegram admin bot runtime
- `postgres`: primary database
- `caddy`: TLS termination and reverse proxy
- optional `minio`: self-hosted object storage for single-server deployments
## Deployment notes
- Run one Compose project on a single server.
- Keep persistent data in named volumes or external storage.
- Keep secrets in server-side environment files or a secret manager.
- Back up PostgreSQL and object storage separately.
- Prefer Telegram long polling in MVP to avoid an extra public webhook surface for the bot.
## Upgrade strategy
- Build new images.
- Run the one-shot database schema job.
- Restart `web`, `worker`, and `bot` in the same Compose project.
- Roll back by redeploying the previous image set if schema changes are backward compatible.
## Current database bootstrap state
- The current Compose template runs a `migrate` service before `web`, `worker`, and `bot`.
- The job runs `prisma migrate deploy` from the committed migration history.
- The same bootstrap job also ensures the default MVP `SubscriptionPlan` row exists after migrations.
- Schema changes must land with a new committed Prisma migration before deployment.
## Initial operational checklist
- provision VPS
- install Docker and Compose plugin
- provision DNS and TLS
- provision PostgreSQL storage
- provision S3-compatible storage or enable local MinIO
- create `.env`
- deploy Compose stack
- run database migration job
- verify web health, worker job loop, and bot polling

View File

@@ -0,0 +1,67 @@
# Provider Key Pool
## Purpose
Route generation traffic through multiple provider API keys while hiding transient failures from end users.
## Key selection
- Only keys in `active` state are eligible for first-pass routing.
- Requests start from the next active key by round robin.
- A single request must not attempt the same key twice.
## Optional proxy behavior
- A key may have one optional proxy attached.
- If a proxy exists, the first attempt uses the proxy.
- If the proxy path fails with a transport error, retry the same key directly.
- Direct fallback does not bypass other business checks.
- Current runtime policy reads cooldown and manual-review thresholds from environment:
- `KEY_COOLDOWN_MINUTES`
- `KEY_FAILURES_BEFORE_MANUAL_REVIEW`
## Retry rules
Retry on the next key only for:
- network errors
- connection failures
- timeouts
- provider `5xx`
Do not retry on the next key for:
- validation errors
- unsupported inputs
- policy rejections
- other user-caused provider `4xx`
## States
- `active`
- `cooldown`
- `out_of_funds`
- `manual_review`
- `disabled`
## Transitions
- `active -> cooldown` on retryable failures
- `cooldown -> active` after successful automatic recheck
- `cooldown -> manual_review` after more than 10 consecutive retryable failures across recovery cycles
- `active|cooldown -> out_of_funds` on confirmed insufficient funds
- `out_of_funds -> active` only by manual admin action
- `manual_review -> active` only by manual admin action
- `active -> disabled` by manual admin action
## Current runtime note
- The current worker implementation already applies proxy-first then direct fallback within one provider-key attempt.
- The current worker implementation writes `GenerationAttempt.usedProxy` and `GenerationAttempt.directFallbackUsed` for auditability.
- The current worker implementation also runs a background cooldown-recovery sweep and returns keys to `active` after `cooldownUntil` passes.
## Balance tracking
- Primary source of truth is the provider balance API.
- Balance refresh runs periodically and also after relevant failures.
- Telegram admin output must show per-key balance snapshots and the count of keys in `out_of_funds`.
## Admin expectations
Web admin and Telegram admin must both be able to:
- inspect key state
- inspect last error category and code
- inspect balance snapshot and refresh time
- enable or disable a key
- return a key from `manual_review`
- return a key from `out_of_funds`
- add a new key

View File

@@ -0,0 +1,48 @@
# Telegram Pairing Flow
## Goal
Allow a new Telegram admin to be approved from the server console without editing the database manually.
## Runtime behavior
### Unpaired user
1. A user opens the Telegram bot.
2. The bot checks whether `telegram_user_id` is present in the allowlist.
3. If not present, the bot creates a pending pairing record with:
- Telegram user ID
- Telegram username and display name snapshot
- pairing code hash
- expiration timestamp
- status `pending`
4. The bot replies with a message telling the user to run `nproxy pair <code>` on the server.
Current runtime note:
- The current bot runtime uses Telegram long polling.
- On each message from an unpaired user, the bot rotates any previous pending code and issues a fresh pairing code.
- Pending pairing creation writes an audit-log entry with actor type `system`.
### Pair completion
1. An operator runs `nproxy pair <code>` on the server.
2. The CLI looks up the pending pairing by code.
3. The CLI prints the target Telegram identity and asks for confirmation.
4. On confirmation, the CLI adds the Telegram user to the allowlist.
5. The CLI marks the pending pairing record as `completed`.
6. The CLI writes an admin action log entry.
## Required CLI commands
- `nproxy pair <code>`
- `nproxy pair list`
- `nproxy pair revoke <telegram-user-id>`
- `nproxy pair cleanup`
## Current CLI behavior
- `nproxy pair <code>` prints the Telegram identity and requires explicit confirmation unless `--yes` is provided.
- `nproxy pair list` prints active allowlist entries and pending pairing records.
- `nproxy pair revoke <telegram-user-id>` requires explicit confirmation unless `--yes` is provided.
- `nproxy pair cleanup` marks expired pending pairing records as `expired` and writes an audit log entry.
## Security rules
- Pairing codes expire.
- Pairing codes are stored hashed, not in plaintext.
- Only the server-side CLI can complete a pairing.
- Telegram bot access is denied until allowlist membership exists.
- Every pairing and revocation action is auditable.