# Caching Architecture

This document describes the caching layer used in the Optima API, covering the Redis-backed opportunity cache, TTL algorithms, background refresh mechanics, retry logic, and debugging tools.

---

## Overview

The API caches expensive ConnectWise (CW) API responses in **Redis** to reduce latency and avoid CW rate limits. The primary cache layer is the **opportunity cache** (`src/modules/cache/opportunityCache.ts`), which proactively warms data for all non-closed opportunities on a background interval.

The API also maintains a Redis-backed **sales member metrics cache** (`src/modules/cache/salesOpportunityMetricsCache.ts`) refreshed every 5 minutes. It precomputes per-member dashboard/reporting figures (pipeline revenue, won/lost counts, win rate, avg days to close, and related metrics) for fast reads from `/v1/sales/opportunities/metrics`.

### Key design principles

- **Adaptive TTLs** — cache durations are computed dynamically based on how "hot" an opportunity is (recently updated = shorter TTL = fresher data).
- **Background refresh** — a 20-minute interval scans all open opportunities and re-fetches only expired cache keys.
- **Bounded concurrency** — CW API calls are throttled via thunk-based batching to prevent overwhelming the upstream API.
- **Graceful degradation** — transient CW errors (timeouts, network failures) are caught, logged, and retried on the next cycle rather than crashing the process.
- **Priority ordering** — most recently updated opportunities are refreshed first so active deals get fresh data before stale ones.

---

## What is cached

Each non-closed opportunity can have up to 7 cached payloads in Redis:

| Cache Key Pattern                   | Data                                 | Source                                                                |
| ----------------------------------- | ------------------------------------ | --------------------------------------------------------------------- |
| `opp:cw-data:{cwOpportunityId}`     | Raw CW opportunity response          | `GET /sales/opportunities/:id`                                        |
| `opp:activities:{cwOpportunityId}`  | CW activities array                  | `GET /sales/activities?conditions=opportunity/id=:id`                 |
| `opp:notes:{cwOpportunityId}`       | CW notes array                       | `GET /sales/opportunities/:id/notes`                                  |
| `opp:contacts:{cwOpportunityId}`    | CW contacts array                    | `GET /sales/opportunities/:id/contacts`                               |
| `opp:products:{cwOpportunityId}`    | Forecast + procurement products blob | `GET /sales/opportunities/:id/forecast` + `GET /procurement/products` |
| `opp:company-cw:{cw_CompanyId}`     | Hydrated company + contacts blob     | `GET /company/companies/:id` + contacts endpoints                     |
| `opp:site:{cwCompanyId}:{cwSiteId}` | Company site data                    | `GET /company/companies/:id/sites/:siteId`                            |

Inventory-adjustment-driven catalog sync adds a targeted product cache:

| Cache Key Pattern        | Data                                                       | Source                                                                                       |
| ------------------------ | ---------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| `catalog:item:cw:{cwId}` | Full CW catalog item + computed `onHand` + DB row snapshot | `GET /procurement/adjustments` + `GET /procurement/catalog/:id` + catalog inventory endpoint |

Sales opportunity metrics caching adds member-focused keys:

| Cache Key Pattern                     | Data                                   | Source                                                                                |
| ------------------------------------- | -------------------------------------- | ------------------------------------------------------------------------------------- |
| `sales:metrics:members:all`           | Envelope of all active-member metrics  | Precomputed from active CW members + assigned opportunities + products cache/CW fetch |
| `sales:metrics:member:{cwIdentifier}` | One member's computed metrics snapshot | Same as above                                                                         |
| `sales:metrics:oppRevenue:{cwOppId}`  | Per-opportunity computed revenue blob  | Metrics refresh lookups (products cache-first, then manager/controller fallback)      |

---

## TTL Algorithms

Three algorithms compute cache TTLs. All share the same input signals:

- `closedFlag` — whether the opportunity is closed
- `closedDate` — when it was closed
- `expectedCloseDate` — projected close date (forward-looking signal)
- `lastUpdated` — last CW modification date (backward-looking signal)

### Primary TTL (`computeCacheTTL`)

**File:** `src/modules/algorithms/computeCacheTTL.ts`

Used for: opportunity CW data, activities, company CW data.

| #   | Condition                                               | TTL        | Human        |
| --- | ------------------------------------------------------- | ---------- | ------------ |
| 1a  | Closed > 30 days ago                                    | `null`     | Do not cache |
| 1b  | Closed within 30 days                                   | 900,000 ms | 15 minutes   |
| 2   | `expectedCloseDate` or `lastUpdated` within **5 days**  | 30,000 ms  | 30 seconds   |
| 3   | `expectedCloseDate` or `lastUpdated` within **14 days** | 60,000 ms  | 60 seconds   |
| 4   | Everything else                                         | 900,000 ms | 15 minutes   |

Rules are evaluated top-to-bottom; first match wins.

### Sub-Resource TTL (`computeSubResourceCacheTTL`)

**File:** `src/modules/algorithms/computeSubResourceCacheTTL.ts`

Used for: notes, contacts.

| #   | Condition             | TTL        | Human        |
| --- | --------------------- | ---------- | ------------ |
| 1a  | Closed > 30 days ago  | `null`     | Do not cache |
| 1b  | Closed within 30 days | 300,000 ms | 5 minutes    |
| 2   | Within **5 days**     | 60,000 ms  | 60 seconds   |
| 3   | Within **14 days**    | 120,000 ms | 2 minutes    |
| 4   | Everything else       | 300,000 ms | 5 minutes    |

### Products TTL (`computeProductsCacheTTL`)

**File:** `src/modules/algorithms/computeProductsCacheTTL.ts`

Used for: forecast + procurement products.

| #   | Condition                                   | TTL          | Human      |
| --- | ------------------------------------------- | ------------ | ---------- |
| 1   | Status is Won/Lost/Pending Won/Pending Lost | `null`       | No cache   |
| 2   | Main cache TTL is `null`                    | `null`       | No cache   |
| 3   | `lastUpdated` within **3 days**             | 15,000 ms    | 15 seconds |
| 4   | Everything else                             | 1,200,000 ms | 20 minutes |

Products on terminal-status opportunities are never proactively cached. Non-hot products use a **lazy on-demand** cache — they're fetched when requested and cached for 20 minutes.

### Site TTL

Sites use a fixed TTL of **20 minutes** (1,200,000 ms). Site/address data rarely changes. Sites are **not** proactively warmed by the background refresh — they are populated lazily on the first detail-view request.

---

## Background Refresh

**Function:** `refreshOpportunityCache()` in `src/modules/cache/opportunityCache.ts`

**Interval:** Every 20 minutes, triggered from `src/index.ts`.

### Refresh cycle

1. **Query DB** — fetch all non-closed opportunities + recently closed (within 30 days), ordered by `cwLastUpdated DESC` (most recently active first).
2. **Batch EXISTS check** — use a single Redis pipeline to check which cache keys already exist (5 EXISTS commands per opportunity: oppCwData, activities, notes, contacts, products).
3. **Build thunk list** — for each opportunity with missing keys, push a **thunk** (lazy function) into the task list. No HTTP requests fire at this point.
4. **Execute with bounded concurrency** — process thunks in batches of `CONCURRENCY` (currently **6**), with a `BATCH_DELAY_MS` (currently **250ms**) pause between batches. Each thunk is only invoked inside the batch loop.
5. **Emit events** — `cache:opportunities:refresh:started` and `cache:opportunities:refresh:completed` events are emitted for the event debugger.

### Inventory-adjustment listener cycle

**Function:** `listenInventoryAdjustments()` in `src/modules/cw-utils/procurement/listenInventoryAdjustments.ts`

**Interval:** Every 60 seconds, triggered from `src/index.ts`.

1. Fetch `GET /procurement/adjustments?pageSize=1000`.
2. Build a normalized snapshot of tracked inventory rows (`cwCatalogId`, `onHand`, `inventory`) per adjustment.
3. Compare to previous snapshot; extract only changed product IDs.
4. For each changed product ID, fetch fresh CW catalog item + current on-hand.
5. Upsert `CatalogItem` in Postgres and write Redis key `catalog:item:cw:{cwId}` with a 20-minute TTL.

Guardrails to prevent request storms:

- Diffing is computed at **product state** level (grouped by `cwCatalogId`), not raw adjustment-row churn.
- Per-cycle syncs are capped (`CW_ADJUSTMENT_SYNC_MAX_PER_CYCLE`, default `50`).
- Product resync cooldown is enforced (`CW_ADJUSTMENT_SYNC_COOLDOWN_MS`, default `600000` ms / 10 min).

This avoids full-catalog sweeps for small inventory movements and updates only the products implicated by adjustments.

### Full procurement catalog refresh

**Function:** `refreshCatalog()` in `src/modules/cw-utils/procurement/refreshCatalog.ts`

**Interval:** Every 30 minutes, triggered from `src/index.ts`.

The full catalog cache/DB sync uses the same slow-parallel thunk strategy as opportunity cache refreshes:

- Build arrays of thunk tasks (`() => Promise<void>`) for CW item fetches, inventory fetches, and DB upserts.
- Execute with bounded concurrency (`CONCURRENCY=6`).
- Pause between batches (`BATCH_DELAY_MS=250`) to avoid CW burst pressure.
- Log task failures and retry naturally on the next cycle.

This keeps full-catalog refresh conservative while inventory-adjustment listener handles near-real-time targeted updates.

### Full inventory sweep fallback

`refreshInventory()` remains as a safety net but is intentionally infrequent:

- Runs every **6 hours** from `src/index.ts` (no startup-time full sweep).
- Uses the same slow-parallel pattern (`CONCURRENCY=6`, `BATCH_DELAY_MS=250`) to avoid burst traffic.

Most on-hand freshness now comes from the 60-second adjustment listener plus 30-minute full catalog refresh.

### Concurrency control

The thunk pattern is critical. Previously, tasks were pushed as already-executing promises (`refreshTasks.push(fetchAndCache(...))`), which meant all HTTP requests fired simultaneously regardless of the batching loop. The fix was changing the array type from `Promise<void>[]` to `(() => Promise<void>)[]` so requests only start when explicitly invoked: `batch.map((fn) => fn())`.

### Current tuning

| Parameter        | Value      | Effect                                     |
| ---------------- | ---------- | ------------------------------------------ |
| `CONCURRENCY`    | 6          | Max simultaneous CW API requests per batch |
| `BATCH_DELAY_MS` | 250        | Milliseconds between batches               |
| Refresh interval | 20 minutes | How often the full sweep runs              |

At these settings, a full sweep of ~500 expired keys completes in ~1-2 minutes with zero CW errors and ~230ms median latency.

### Sales metrics refresh job

**Function:** `refreshSalesOpportunityMetricsCache()` in `src/modules/cache/salesOpportunityMetricsCache.ts`

**Interval:** Every 5 minutes, triggered from `src/index.ts`.

**Startup behavior:** On app startup, the refresh is invoked once with `forceColdLoad=true`, which clears metrics-owned Redis keys and bypasses metrics/product cache reuse for that initial rebuild. Subsequent interval runs use the normal warm path.

Refresh flow:

1. Fetch all active CW members (`inactiveFlag=false`).
  Source: local `CwMember` table (kept in sync by the existing members refresh job).
2. Query DB opportunities assigned to those members (primary or secondary rep), scoped to open opportunities plus YTD-closed opportunities.
3. For each opportunity, compute revenue cache-first from `sales:metrics:oppRevenue:{cwOppId}` then `opp:products:{cwOpportunityId}`, and fallback through the manager/controller path (`opportunities.fetchRecord(...).fetchProducts()`) on miss.
4. Aggregate member metrics (pipeline revenue, won/lost MTD+YTD counts, avg days to close, weighted pipeline, win/loss rates, and related KPIs).
5. Write per-opportunity revenue blobs plus all-member and per-member snapshots to Redis with a 10-minute TTL.

Safety controls:

- **Single-flight lock** prevents overlapping refresh runs if a prior run is still in progress.
- **Per-opportunity timeout guard** ensures slow CW product lookups degrade to zero-revenue fallback instead of stalling the full refresh.
- **Force-cold-load mode** clears `sales:metrics:*` runtime state owned by the metrics cache before rebuilding startup data.

This cache-first model prioritizes metrics-owned opportunity revenue keys first, then opportunity product cache entries, and only reaches CW when needed.

---

## Retry Logic (`withCwRetry`)

**File:** `src/modules/cw-utils/withCwRetry.ts`

Wraps CW API calls with exponential backoff retry on transient errors.

### Retryable errors

- `ECONNABORTED` (timeout)
- `ECONNRESET`
- `ETIMEDOUT`
- `ECONNREFUSED`
- `ERR_NETWORK`
- `ENETUNREACH`
- HTTP 5xx server errors

### Default configuration

| Parameter     | Default | Description                                                 |
| ------------- | ------- | ----------------------------------------------------------- |
| `maxAttempts` | 3       | Total attempts including the first                          |
| `baseDelayMs` | 1,000   | Delay before first retry (doubles each retry: 1s → 2s → 4s) |
| `label`       | —       | Optional tag for log messages                               |

### Usage

```ts
import { withCwRetry } from "./withCwRetry";

const response = await withCwRetry(
  () => connectWiseApi.get(`/company/companies/${id}`),
  { label: `fetchCompany#${id}`, maxAttempts: 3, baseDelayMs: 1_500 },
);
```

Non-transient errors (404, 400, etc.) are re-thrown immediately without retry.

---

## CW API Logger

**File:** `src/modules/cw-utils/cwApiLogger.ts`

Axios interceptor that logs every CW API call to a JSONL file. Logging is **opt-in** — set the `LOG_CW_API` environment variable to enable it. Each process start creates a new timestamped file in the `cw-api-logs/` directory (e.g., `cw-api-logs/2026-03-02T14-30-05.123Z.jsonl`).

### Enabling logging

```bash
# Via the dev:log shorthand script
bun run dev:log

# Or manually with any command
LOG_CW_API=1 bun run dev
```

### Log entry fields

| Field        | Type              | Description                         |
| ------------ | ----------------- | ----------------------------------- |
| `timestamp`  | string (ISO-8601) | When the request completed          |
| `method`     | string            | HTTP method                         |
| `url`        | string            | Request URL (relative or absolute)  |
| `baseURL`    | string            | Axios baseURL                       |
| `status`     | number \| null    | HTTP status (null on network error) |
| `durationMs` | number            | Wall-clock time in milliseconds     |
| `error`      | string \| null    | Error code + message, if any        |
| `timeout`    | number            | Configured timeout in ms            |

### Analysis

Run the analyzer script to analyze the most recent log file:

```bash
bun run utils:analyze_cw
```

Or specify a particular file:

```bash
python3 debug-scripts/analyze-cw-calls.py cw-api-logs/2026-03-02T14-30-05.123Z.jsonl
```

This executes `debug-scripts/analyze-cw-calls.py` which produces:

- Overview (total calls, error rate, time span)
- Duration statistics (min, max, mean, p50, p90, p95, p99, distribution histogram)
- Error breakdown by type and endpoint
- Top 20 slowest calls
- Per-endpoint stats (count, errors, mean, p50, p95, max, total time)
- Timeline (per-minute throughput and errors)
- Concurrency hotspot detection
- Summary with recommendations

To clear all logs:

```bash
rm -rf cw-api-logs/
```

---

## Cache Invalidation

Mutation endpoints invalidate the relevant cache keys so the next read fetches fresh data from CW:

| Mutation                       | Cache invalidated                                                |
| ------------------------------ | ---------------------------------------------------------------- |
| Create/update/delete note      | `opp:notes:{cwOpportunityId}` via `invalidateNotesCache()`       |
| Create/update/delete contact   | `opp:contacts:{cwOpportunityId}` via `invalidateContactsCache()` |
| Add/update/resequence products | `opp:products:{cwOpportunityId}` via `invalidateProductsCache()` |
| Refresh opportunity            | All keys for that opportunity (via re-fetch)                     |

---

## ConnectWise API Configuration

The shared Axios instance (`connectWiseApi`) is configured in `src/constants.ts`:

| Setting   | Value                                                | Purpose                        |
| --------- | ---------------------------------------------------- | ------------------------------ |
| `baseURL` | `https://ttscw.totaltech.net/v4_6_release/apis/3.0/` | CW API base                    |
| `timeout` | 30,000 ms (30s)                                      | Per-request timeout            |
| Logger    | `attachCwApiLogger()`                                | Writes to `cw-api-calls.jsonl` |

---

## Architecture diagram

```
src/index.ts
  │
  ├─ setInterval(refreshOpportunityCache, 20m)
  │
  └─► src/modules/cache/opportunityCache.ts
        │
        ├─ prisma.opportunity.findMany(orderBy: cwLastUpdated DESC)
        ├─ redis.pipeline().exists(...)  ← batch key check
        │
        ├─ Build thunk list (lazy functions)
        │
        └─ Execute thunks with CONCURRENCY=6, DELAY=250ms
             │
             ├─► fetchAndCacheOppCwData()      ─► opportunityCw.fetch()
             ├─► fetchAndCacheActivities()     ─► activityCw.fetchByOpportunityDirect()
             ├─► fetchAndCacheNotes()          ─► opportunityCw.fetchNotes()
             ├─► fetchAndCacheContacts()       ─► opportunityCw.fetchContacts()
             ├─► fetchAndCacheProducts()       ─► opportunityCw.fetchProducts() + fetchProcurementProducts()
             ├─► fetchAndCacheCompanyCwData()  ─► fetchCwCompanyById() + contacts
             └─► fetchAndCacheSite()           ─► fetchCompanySite() (lazy only)
                   │
                   └─► connectWiseApi.get(...)  ← withCwRetry + cwApiLogger interceptors
                         │
                         └─► Redis SET with computed TTL
```

---

## File reference

| File                                                             | Purpose                                                       |
| ---------------------------------------------------------------- | ------------------------------------------------------------- |
| `src/modules/cache/opportunityCache.ts`                          | Cache read/write helpers, background refresh logic            |
| `src/modules/algorithms/computeCacheTTL.ts`                      | Primary adaptive TTL algorithm                                |
| `src/modules/algorithms/computeSubResourceCacheTTL.ts`           | Sub-resource (notes, contacts) TTL algorithm                  |
| `src/modules/algorithms/computeProductsCacheTTL.ts`              | Products TTL algorithm                                        |
| `src/modules/cw-utils/withCwRetry.ts`                            | Retry wrapper with exponential backoff                        |
| `src/modules/cw-utils/cwApiLogger.ts`                            | Axios interceptor for JSONL call logging                      |
| `src/modules/cw-utils/fetchCompany.ts`                           | Company fetch with retry                                      |
| `src/modules/cw-utils/procurement/listenInventoryAdjustments.ts` | Adjustment listener for targeted catalog-item cache + DB sync |
| `src/modules/cache/salesOpportunityMetricsCache.ts`              | 5-minute active-member opportunity metrics cache              |
| `src/constants.ts`                                               | CW Axios instance config (timeout, logger)                    |
| `src/index.ts`                                                   | Refresh interval registration                                 |
| `debug-scripts/analyze-cw-calls.py`                              | CW API call analysis script                                   |