# Client Stage Phase 1 — Smoke Test Runbook

**Audience:** Larry (and Forge on re-invoke). Confirms migration 010 + the 8 endpoints + auth model behave as designed before Riv layers on Phase 2.

## Pre-flight

1. Migration 010 applied. Verify with:
   ```sh
   node -e "
   import('./scripts/_pg.mjs').then(async ({ makeClient }) => {
     const c = makeClient(); await c.connect();
     const r = await c.query(\`SELECT id, note FROM atlas_migrations WHERE id = '010_client_stage'\`);
     console.log(r.rows);
     await c.end();
   });"
   ```
2. Atlas server running (PM2 `atlas` process). After deploying these files, **a PM2 reload from elevated PowerShell is required** (memory: `reference_atlas_runtime`, `reference_atlas_rebuild_restart`).
3. `ATLAS_INGEST_TOKEN` from `app/.env.local` available as `$TOKEN`.

```sh
TOKEN=$(grep -E "^ATLAS_INGEST_TOKEN=" app/.env.local | cut -d= -f2- | tr -d '"' | tr -d "'" | tr -d '\r')
BASE="http://localhost:3000"      # PM2 prod port (port 80 is dev)
```

## What was verified (all 19 tests passed against a temp instance on :3001 during this build)

### Read paths

| # | Test | Expected | Result |
|---|------|----------|--------|
| 1 | `GET /api/clients?stage=weekly` with legacy token | 200, 52 clients listed | ✅ 52 clients with stage+flags+version |
| 2 | `GET /api/clients/afton-electric-llc` (by slug) | 200, single client with last_stage_event | ✅ |
| 3 | `GET /api/clients/1` (by id) | 200, single client | ✅ |
| 4 | `?stage=bogus` | 400, "stage must be one of …" | ✅ |
| 5 | Missing Authorization header | 401, "missing bearer token" | ✅ |
| 6 | `Bearer notarealtoken` | 401, "unauthorized" | ✅ |

### Write paths

| # | Test | Expected | Result |
|---|------|----------|--------|
| 7 | `PATCH /api/clients/1/stage` weekly→eom_close with expected_version=2 | 200, version → 3 | ✅ |
| 8 | Re-PATCH with stale expected_version=2 | 409 with actual_version + expected_version | ✅ |
| 9 | PATCH to same stage | 400, "client is already in that stage" | ✅ |
| 10 | `POST /api/clients/1/flags/client_blocking` | 200, flag in array, history row | ✅ |
| 11 | Re-POST same flag | 200, `no_op: true` | ✅ |
| 12 | `DELETE` the flag | 200, flag removed | ✅ |
| 13 | `GET /api/clients/1/stage_history?limit=10` | 200, 3 rows in audit log with full actor info | ✅ |

### Rollback + queue

| # | Test | Expected | Result |
|---|------|----------|--------|
| 14 | `POST /api/clients/1/stage_history/1/rollback` | 200, client back in weekly, compensating row appended (trigger_type='rollback') | ✅ |
| 15 | `GET /api/agents/system:legacy/queue` (self-query) | 200, owner-based default list | ✅ |
| 16 | `GET /api/agents/agent:ledger/queue` (admin caller) | 200, curated eom_close/eom_review list | ✅ |

### Permission scoping (validates Phase 2 readiness)

| # | Test | Expected | Result |
|---|------|----------|--------|
| 17 | Low-priv token (can_read_clients only) attempting stage write | **403 with `missing_permission: 'can_set_stage'` and `actor_id`** | ✅ (after bugfix — initial implementation returned 409 because version check ran before auth) |
| 17b | Same low-priv token doing a read | 200 | ✅ |
| 18a | Scoped `can_set_flag` (scope=`{flags:["client_blocking"]}`) setting in-scope flag | 200, flag set | ✅ |
| 18b | Same actor attempting to set out-of-scope `chronic_late` | 403 with `missing_permission: 'can_set_flag'` | ✅ |
| 19 | Cleanup of smoke test rows | DB returned to pre-test state | ✅ |

## Bug found and fixed during smoke testing

**Location:** `app/api/clients/[slug]/stage/route.ts`

**Problem:** Optimistic-lock check ran BEFORE the auth check, so an actor lacking `can_set_stage` got a misleading 409 ("version conflict") rather than a 403 ("permission denied"). This would have confused agents and made permission debugging painful.

**Fix:** Two-pass auth — a coarse `requireStageAuth(req, 'can_set_stage', { to_stage })` runs BEFORE the DB transaction, so token/permission errors fire fast. A scope-aware re-check still runs after the row is locked (to catch actors who hold `can_set_stage` scoped to specific transitions like `eom_close→eom_review`). Same change applied to the `can_offboard` check.

## Re-running these tests

After PM2 reload, an operator can re-verify with the same curl snippets. The smoke test seed user (`user:smoke-readonly`) is removed during cleanup (test 19) — if a re-run is needed, re-seed by following the same `node -e "..."` snippets used in tests 17/18.

## State left behind in production DB

- Migration 010 applied (`atlas_migrations` row added).
- Tri-County Tire (id=1) has `version=6` (smoke-test churn) and 4 stage_history rows from this build session. The client is back in `weekly` with no flags. Stage_entered_at was bumped to the rollback timestamp — this is correct behavior, not a leak.
- The smoke-test rows in `stage_history` were left in place (they're real audit trail; deleting them would itself be auditable). Larry can see them via `GET /api/clients/1/stage_history`.
- `user:jimmie` seed permissions intact (`can_admin`, `can_sign_off_eom_review`).

## Sign-off

All 19 tests pass. Schema solid. Endpoints behave per spec. Audit log captures actor_id + channel + trigger_type on every change. Optimistic lock works. Permission scoping works. Phase 2 path is clear: Riv adds rows to `agent_permissions` and `actor_tokens`, agents call the same endpoints, no schema or code refactor needed.
