# Caching Architecture This document describes the caching layer used in the Optima API, covering the Redis-backed opportunity cache, TTL algorithms, background refresh mechanics, retry logic, and debugging tools. --- ## Overview The API caches expensive ConnectWise (CW) API responses in **Redis** to reduce latency and avoid CW rate limits. The primary cache layer is the **opportunity cache** (`src/modules/cache/opportunityCache.ts`), which proactively warms data for all non-closed opportunities on a background interval. The API also maintains a Redis-backed **sales member metrics cache** (`src/modules/cache/salesOpportunityMetricsCache.ts`) refreshed every 5 minutes. It precomputes per-member dashboard/reporting figures (pipeline revenue, won/lost counts, win rate, avg days to close, and related metrics) for fast reads from `/v1/sales/opportunities/metrics`. ### Key design principles - **Adaptive TTLs** — cache durations are computed dynamically based on how "hot" an opportunity is (recently updated = shorter TTL = fresher data). - **Background refresh** — a 20-minute interval scans all open opportunities and re-fetches only expired cache keys. - **Bounded concurrency** — CW API calls are throttled via thunk-based batching to prevent overwhelming the upstream API. - **Graceful degradation** — transient CW errors (timeouts, network failures) are caught, logged, and retried on the next cycle rather than crashing the process. - **Priority ordering** — most recently updated opportunities are refreshed first so active deals get fresh data before stale ones. --- ## What is cached Each non-closed opportunity can have up to 7 cached payloads in Redis: | Cache Key Pattern | Data | Source | | ----------------------------------- | ------------------------------------ | --------------------------------------------------------------------- | | `opp:cw-data:{cwOpportunityId}` | Raw CW opportunity response | `GET /sales/opportunities/:id` | | `opp:activities:{cwOpportunityId}` | CW activities array | `GET /sales/activities?conditions=opportunity/id=:id` | | `opp:notes:{cwOpportunityId}` | CW notes array | `GET /sales/opportunities/:id/notes` | | `opp:contacts:{cwOpportunityId}` | CW contacts array | `GET /sales/opportunities/:id/contacts` | | `opp:products:{cwOpportunityId}` | Forecast + procurement products blob | `GET /sales/opportunities/:id/forecast` + `GET /procurement/products` | | `opp:company-cw:{cw_CompanyId}` | Hydrated company + contacts blob | `GET /company/companies/:id` + contacts endpoints | | `opp:site:{cwCompanyId}:{cwSiteId}` | Company site data | `GET /company/companies/:id/sites/:siteId` | Inventory-adjustment-driven catalog sync adds a targeted product cache: | Cache Key Pattern | Data | Source | | ------------------------ | ---------------------------------------------------------- | -------------------------------------------------------------------------------------------- | | `catalog:item:cw:{cwId}` | Full CW catalog item + computed `onHand` + DB row snapshot | `GET /procurement/adjustments` + `GET /procurement/catalog/:id` + catalog inventory endpoint | Sales opportunity metrics caching adds member-focused keys: | Cache Key Pattern | Data | Source | | ------------------------------------- | -------------------------------------- | ------------------------------------------------------------------------------------- | | `sales:metrics:members:all` | Envelope of all active-member metrics | Precomputed from active CW members + assigned opportunities + products cache/CW fetch | | `sales:metrics:member:{cwIdentifier}` | One member's computed metrics snapshot | Same as above | | `sales:metrics:oppRevenue:{cwOppId}` | Per-opportunity computed revenue blob | Metrics refresh lookups (products cache-first, then manager/controller fallback) | --- ## TTL Algorithms Three algorithms compute cache TTLs. All share the same input signals: - `closedFlag` — whether the opportunity is closed - `closedDate` — when it was closed - `expectedCloseDate` — projected close date (forward-looking signal) - `lastUpdated` — last CW modification date (backward-looking signal) ### Primary TTL (`computeCacheTTL`) **File:** `src/modules/algorithms/computeCacheTTL.ts` Used for: opportunity CW data, activities, company CW data. | # | Condition | TTL | Human | | --- | ------------------------------------------------------- | ---------- | ------------ | | 1a | Closed > 30 days ago | `null` | Do not cache | | 1b | Closed within 30 days | 900,000 ms | 15 minutes | | 2 | `expectedCloseDate` or `lastUpdated` within **5 days** | 30,000 ms | 30 seconds | | 3 | `expectedCloseDate` or `lastUpdated` within **14 days** | 60,000 ms | 60 seconds | | 4 | Everything else | 900,000 ms | 15 minutes | Rules are evaluated top-to-bottom; first match wins. ### Sub-Resource TTL (`computeSubResourceCacheTTL`) **File:** `src/modules/algorithms/computeSubResourceCacheTTL.ts` Used for: notes, contacts. | # | Condition | TTL | Human | | --- | --------------------- | ---------- | ------------ | | 1a | Closed > 30 days ago | `null` | Do not cache | | 1b | Closed within 30 days | 300,000 ms | 5 minutes | | 2 | Within **5 days** | 60,000 ms | 60 seconds | | 3 | Within **14 days** | 120,000 ms | 2 minutes | | 4 | Everything else | 300,000 ms | 5 minutes | ### Products TTL (`computeProductsCacheTTL`) **File:** `src/modules/algorithms/computeProductsCacheTTL.ts` Used for: forecast + procurement products. | # | Condition | TTL | Human | | --- | ------------------------------------------- | ------------ | ---------- | | 1 | Status is Won/Lost/Pending Won/Pending Lost | `null` | No cache | | 2 | Main cache TTL is `null` | `null` | No cache | | 3 | `lastUpdated` within **3 days** | 15,000 ms | 15 seconds | | 4 | Everything else | 1,200,000 ms | 20 minutes | Products on terminal-status opportunities are never proactively cached. Non-hot products use a **lazy on-demand** cache — they're fetched when requested and cached for 20 minutes. ### Site TTL Sites use a fixed TTL of **20 minutes** (1,200,000 ms). Site/address data rarely changes. Sites are **not** proactively warmed by the background refresh — they are populated lazily on the first detail-view request. --- ## Background Refresh **Function:** `refreshOpportunityCache()` in `src/modules/cache/opportunityCache.ts` **Interval:** Every 20 minutes, triggered from `src/index.ts`. ### Refresh cycle 1. **Query DB** — fetch all non-closed opportunities + recently closed (within 30 days), ordered by `cwLastUpdated DESC` (most recently active first). 2. **Batch EXISTS check** — use a single Redis pipeline to check which cache keys already exist (5 EXISTS commands per opportunity: oppCwData, activities, notes, contacts, products). 3. **Build thunk list** — for each opportunity with missing keys, push a **thunk** (lazy function) into the task list. No HTTP requests fire at this point. 4. **Execute with bounded concurrency** — process thunks in batches of `CONCURRENCY` (currently **6**), with a `BATCH_DELAY_MS` (currently **250ms**) pause between batches. Each thunk is only invoked inside the batch loop. 5. **Emit events** — `cache:opportunities:refresh:started` and `cache:opportunities:refresh:completed` events are emitted for the event debugger. ### Inventory-adjustment listener cycle **Function:** `listenInventoryAdjustments()` in `src/modules/cw-utils/procurement/listenInventoryAdjustments.ts` **Interval:** Every 60 seconds, triggered from `src/index.ts`. 1. Fetch `GET /procurement/adjustments?pageSize=1000`. 2. Build a normalized snapshot of tracked inventory rows (`cwCatalogId`, `onHand`, `inventory`) per adjustment. 3. Compare to previous snapshot; extract only changed product IDs. 4. For each changed product ID, fetch fresh CW catalog item + current on-hand. 5. Upsert `CatalogItem` in Postgres and write Redis key `catalog:item:cw:{cwId}` with a 20-minute TTL. Guardrails to prevent request storms: - Diffing is computed at **product state** level (grouped by `cwCatalogId`), not raw adjustment-row churn. - Per-cycle syncs are capped (`CW_ADJUSTMENT_SYNC_MAX_PER_CYCLE`, default `50`). - Product resync cooldown is enforced (`CW_ADJUSTMENT_SYNC_COOLDOWN_MS`, default `600000` ms / 10 min). This avoids full-catalog sweeps for small inventory movements and updates only the products implicated by adjustments. ### Full procurement catalog refresh **Function:** `refreshCatalog()` in `src/modules/cw-utils/procurement/refreshCatalog.ts` **Interval:** Every 30 minutes, triggered from `src/index.ts`. The full catalog cache/DB sync uses the same slow-parallel thunk strategy as opportunity cache refreshes: - Build arrays of thunk tasks (`() => Promise`) for CW item fetches, inventory fetches, and DB upserts. - Execute with bounded concurrency (`CONCURRENCY=6`). - Pause between batches (`BATCH_DELAY_MS=250`) to avoid CW burst pressure. - Log task failures and retry naturally on the next cycle. This keeps full-catalog refresh conservative while inventory-adjustment listener handles near-real-time targeted updates. ### Full inventory sweep fallback `refreshInventory()` remains as a safety net but is intentionally infrequent: - Runs every **6 hours** from `src/index.ts` (no startup-time full sweep). - Uses the same slow-parallel pattern (`CONCURRENCY=6`, `BATCH_DELAY_MS=250`) to avoid burst traffic. Most on-hand freshness now comes from the 60-second adjustment listener plus 30-minute full catalog refresh. ### Concurrency control The thunk pattern is critical. Previously, tasks were pushed as already-executing promises (`refreshTasks.push(fetchAndCache(...))`), which meant all HTTP requests fired simultaneously regardless of the batching loop. The fix was changing the array type from `Promise[]` to `(() => Promise)[]` so requests only start when explicitly invoked: `batch.map((fn) => fn())`. ### Current tuning | Parameter | Value | Effect | | ---------------- | ---------- | ------------------------------------------ | | `CONCURRENCY` | 6 | Max simultaneous CW API requests per batch | | `BATCH_DELAY_MS` | 250 | Milliseconds between batches | | Refresh interval | 20 minutes | How often the full sweep runs | At these settings, a full sweep of ~500 expired keys completes in ~1-2 minutes with zero CW errors and ~230ms median latency. ### Sales metrics refresh job **Function:** `refreshSalesOpportunityMetricsCache()` in `src/modules/cache/salesOpportunityMetricsCache.ts` **Interval:** Every 5 minutes, triggered from `src/index.ts`. **Startup behavior:** On app startup, the refresh is invoked once with `forceColdLoad=true`, which clears metrics-owned Redis keys and bypasses metrics/product cache reuse for that initial rebuild. Subsequent interval runs use the normal warm path. Refresh flow: 1. Fetch all active CW members (`inactiveFlag=false`). Source: local `CwMember` table (kept in sync by the existing members refresh job). 2. Query DB opportunities assigned to those members (primary or secondary rep), scoped to open opportunities plus YTD-closed opportunities. 3. For each opportunity, compute revenue cache-first from `sales:metrics:oppRevenue:{cwOppId}` then `opp:products:{cwOpportunityId}`, and fallback through the manager/controller path (`opportunities.fetchRecord(...).fetchProducts()`) on miss. 4. Aggregate member metrics (pipeline revenue, won/lost MTD+YTD counts, avg days to close, weighted pipeline, win/loss rates, and related KPIs). 5. Write per-opportunity revenue blobs plus all-member and per-member snapshots to Redis with a 10-minute TTL. Safety controls: - **Single-flight lock** prevents overlapping refresh runs if a prior run is still in progress. - **Per-opportunity timeout guard** ensures slow CW product lookups degrade to zero-revenue fallback instead of stalling the full refresh. - **Force-cold-load mode** clears `sales:metrics:*` runtime state owned by the metrics cache before rebuilding startup data. This cache-first model prioritizes metrics-owned opportunity revenue keys first, then opportunity product cache entries, and only reaches CW when needed. --- ## Retry Logic (`withCwRetry`) **File:** `src/modules/cw-utils/withCwRetry.ts` Wraps CW API calls with exponential backoff retry on transient errors. ### Retryable errors - `ECONNABORTED` (timeout) - `ECONNRESET` - `ETIMEDOUT` - `ECONNREFUSED` - `ERR_NETWORK` - `ENETUNREACH` - HTTP 5xx server errors ### Default configuration | Parameter | Default | Description | | ------------- | ------- | ----------------------------------------------------------- | | `maxAttempts` | 3 | Total attempts including the first | | `baseDelayMs` | 1,000 | Delay before first retry (doubles each retry: 1s → 2s → 4s) | | `label` | — | Optional tag for log messages | ### Usage ```ts import { withCwRetry } from "./withCwRetry"; const response = await withCwRetry( () => connectWiseApi.get(`/company/companies/${id}`), { label: `fetchCompany#${id}`, maxAttempts: 3, baseDelayMs: 1_500 }, ); ``` Non-transient errors (404, 400, etc.) are re-thrown immediately without retry. --- ## CW API Logger **File:** `src/modules/cw-utils/cwApiLogger.ts` Axios interceptor that logs every CW API call to a JSONL file. Logging is **opt-in** — set the `LOG_CW_API` environment variable to enable it. Each process start creates a new timestamped file in the `cw-api-logs/` directory (e.g., `cw-api-logs/2026-03-02T14-30-05.123Z.jsonl`). ### Enabling logging ```bash # Via the dev:log shorthand script bun run dev:log # Or manually with any command LOG_CW_API=1 bun run dev ``` ### Log entry fields | Field | Type | Description | | ------------ | ----------------- | ----------------------------------- | | `timestamp` | string (ISO-8601) | When the request completed | | `method` | string | HTTP method | | `url` | string | Request URL (relative or absolute) | | `baseURL` | string | Axios baseURL | | `status` | number \| null | HTTP status (null on network error) | | `durationMs` | number | Wall-clock time in milliseconds | | `error` | string \| null | Error code + message, if any | | `timeout` | number | Configured timeout in ms | ### Analysis Run the analyzer script to analyze the most recent log file: ```bash bun run utils:analyze_cw ``` Or specify a particular file: ```bash python3 debug-scripts/analyze-cw-calls.py cw-api-logs/2026-03-02T14-30-05.123Z.jsonl ``` This executes `debug-scripts/analyze-cw-calls.py` which produces: - Overview (total calls, error rate, time span) - Duration statistics (min, max, mean, p50, p90, p95, p99, distribution histogram) - Error breakdown by type and endpoint - Top 20 slowest calls - Per-endpoint stats (count, errors, mean, p50, p95, max, total time) - Timeline (per-minute throughput and errors) - Concurrency hotspot detection - Summary with recommendations To clear all logs: ```bash rm -rf cw-api-logs/ ``` --- ## Cache Invalidation Mutation endpoints invalidate the relevant cache keys so the next read fetches fresh data from CW: | Mutation | Cache invalidated | | ------------------------------ | ---------------------------------------------------------------- | | Create/update/delete note | `opp:notes:{cwOpportunityId}` via `invalidateNotesCache()` | | Create/update/delete contact | `opp:contacts:{cwOpportunityId}` via `invalidateContactsCache()` | | Add/update/resequence products | `opp:products:{cwOpportunityId}` via `invalidateProductsCache()` | | Refresh opportunity | All keys for that opportunity (via re-fetch) | --- ## ConnectWise API Configuration The shared Axios instance (`connectWiseApi`) is configured in `src/constants.ts`: | Setting | Value | Purpose | | --------- | ---------------------------------------------------- | ------------------------------ | | `baseURL` | `https://ttscw.totaltech.net/v4_6_release/apis/3.0/` | CW API base | | `timeout` | 30,000 ms (30s) | Per-request timeout | | Logger | `attachCwApiLogger()` | Writes to `cw-api-calls.jsonl` | --- ## Architecture diagram ``` src/index.ts │ ├─ setInterval(refreshOpportunityCache, 20m) │ └─► src/modules/cache/opportunityCache.ts │ ├─ prisma.opportunity.findMany(orderBy: cwLastUpdated DESC) ├─ redis.pipeline().exists(...) ← batch key check │ ├─ Build thunk list (lazy functions) │ └─ Execute thunks with CONCURRENCY=6, DELAY=250ms │ ├─► fetchAndCacheOppCwData() ─► opportunityCw.fetch() ├─► fetchAndCacheActivities() ─► activityCw.fetchByOpportunityDirect() ├─► fetchAndCacheNotes() ─► opportunityCw.fetchNotes() ├─► fetchAndCacheContacts() ─► opportunityCw.fetchContacts() ├─► fetchAndCacheProducts() ─► opportunityCw.fetchProducts() + fetchProcurementProducts() ├─► fetchAndCacheCompanyCwData() ─► fetchCwCompanyById() + contacts └─► fetchAndCacheSite() ─► fetchCompanySite() (lazy only) │ └─► connectWiseApi.get(...) ← withCwRetry + cwApiLogger interceptors │ └─► Redis SET with computed TTL ``` --- ## File reference | File | Purpose | | ---------------------------------------------------------------- | ------------------------------------------------------------- | | `src/modules/cache/opportunityCache.ts` | Cache read/write helpers, background refresh logic | | `src/modules/algorithms/computeCacheTTL.ts` | Primary adaptive TTL algorithm | | `src/modules/algorithms/computeSubResourceCacheTTL.ts` | Sub-resource (notes, contacts) TTL algorithm | | `src/modules/algorithms/computeProductsCacheTTL.ts` | Products TTL algorithm | | `src/modules/cw-utils/withCwRetry.ts` | Retry wrapper with exponential backoff | | `src/modules/cw-utils/cwApiLogger.ts` | Axios interceptor for JSONL call logging | | `src/modules/cw-utils/fetchCompany.ts` | Company fetch with retry | | `src/modules/cw-utils/procurement/listenInventoryAdjustments.ts` | Adjustment listener for targeted catalog-item cache + DB sync | | `src/modules/cache/salesOpportunityMetricsCache.ts` | 5-minute active-member opportunity metrics cache | | `src/constants.ts` | CW Axios instance config (timeout, logger) | | `src/index.ts` | Refresh interval registration | | `debug-scripts/analyze-cw-calls.py` | CW API call analysis script |