349 lines
18 KiB
Markdown
349 lines
18 KiB
Markdown
# Caching Architecture
|
|
|
|
This document describes the caching layer used in the Optima API, covering the Redis-backed opportunity cache, TTL algorithms, background refresh mechanics, retry logic, and debugging tools.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
The API caches expensive ConnectWise (CW) API responses in **Redis** to reduce latency and avoid CW rate limits. The primary cache layer is the **opportunity cache** (`src/modules/cache/opportunityCache.ts`), which proactively warms data for all non-closed opportunities on a background interval.
|
|
|
|
### Key design principles
|
|
|
|
- **Adaptive TTLs** — cache durations are computed dynamically based on how "hot" an opportunity is (recently updated = shorter TTL = fresher data).
|
|
- **Background refresh** — a 20-minute interval scans all open opportunities and re-fetches only expired cache keys.
|
|
- **Bounded concurrency** — CW API calls are throttled via thunk-based batching to prevent overwhelming the upstream API.
|
|
- **Graceful degradation** — transient CW errors (timeouts, network failures) are caught, logged, and retried on the next cycle rather than crashing the process.
|
|
- **Priority ordering** — most recently updated opportunities are refreshed first so active deals get fresh data before stale ones.
|
|
|
|
---
|
|
|
|
## What is cached
|
|
|
|
Each non-closed opportunity can have up to 7 cached payloads in Redis:
|
|
|
|
| Cache Key Pattern | Data | Source |
|
|
| ----------------------------------- | ------------------------------------ | --------------------------------------------------------------------- |
|
|
| `opp:cw-data:{cwOpportunityId}` | Raw CW opportunity response | `GET /sales/opportunities/:id` |
|
|
| `opp:activities:{cwOpportunityId}` | CW activities array | `GET /sales/activities?conditions=opportunity/id=:id` |
|
|
| `opp:notes:{cwOpportunityId}` | CW notes array | `GET /sales/opportunities/:id/notes` |
|
|
| `opp:contacts:{cwOpportunityId}` | CW contacts array | `GET /sales/opportunities/:id/contacts` |
|
|
| `opp:products:{cwOpportunityId}` | Forecast + procurement products blob | `GET /sales/opportunities/:id/forecast` + `GET /procurement/products` |
|
|
| `opp:company-cw:{cw_CompanyId}` | Hydrated company + contacts blob | `GET /company/companies/:id` + contacts endpoints |
|
|
| `opp:site:{cwCompanyId}:{cwSiteId}` | Company site data | `GET /company/companies/:id/sites/:siteId` |
|
|
|
|
Inventory-adjustment-driven catalog sync adds a targeted product cache:
|
|
|
|
| Cache Key Pattern | Data | Source |
|
|
| ------------------------ | ---------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
|
|
| `catalog:item:cw:{cwId}` | Full CW catalog item + computed `onHand` + DB row snapshot | `GET /procurement/adjustments` + `GET /procurement/catalog/:id` + catalog inventory endpoint |
|
|
|
|
---
|
|
|
|
## TTL Algorithms
|
|
|
|
Three algorithms compute cache TTLs. All share the same input signals:
|
|
|
|
- `closedFlag` — whether the opportunity is closed
|
|
- `closedDate` — when it was closed
|
|
- `expectedCloseDate` — projected close date (forward-looking signal)
|
|
- `lastUpdated` — last CW modification date (backward-looking signal)
|
|
|
|
### Primary TTL (`computeCacheTTL`)
|
|
|
|
**File:** `src/modules/algorithms/computeCacheTTL.ts`
|
|
|
|
Used for: opportunity CW data, activities, company CW data.
|
|
|
|
| # | Condition | TTL | Human |
|
|
| --- | ------------------------------------------------------- | ---------- | ------------ |
|
|
| 1a | Closed > 30 days ago | `null` | Do not cache |
|
|
| 1b | Closed within 30 days | 900,000 ms | 15 minutes |
|
|
| 2 | `expectedCloseDate` or `lastUpdated` within **5 days** | 30,000 ms | 30 seconds |
|
|
| 3 | `expectedCloseDate` or `lastUpdated` within **14 days** | 60,000 ms | 60 seconds |
|
|
| 4 | Everything else | 900,000 ms | 15 minutes |
|
|
|
|
Rules are evaluated top-to-bottom; first match wins.
|
|
|
|
### Sub-Resource TTL (`computeSubResourceCacheTTL`)
|
|
|
|
**File:** `src/modules/algorithms/computeSubResourceCacheTTL.ts`
|
|
|
|
Used for: notes, contacts.
|
|
|
|
| # | Condition | TTL | Human |
|
|
| --- | --------------------- | ---------- | ------------ |
|
|
| 1a | Closed > 30 days ago | `null` | Do not cache |
|
|
| 1b | Closed within 30 days | 300,000 ms | 5 minutes |
|
|
| 2 | Within **5 days** | 60,000 ms | 60 seconds |
|
|
| 3 | Within **14 days** | 120,000 ms | 2 minutes |
|
|
| 4 | Everything else | 300,000 ms | 5 minutes |
|
|
|
|
### Products TTL (`computeProductsCacheTTL`)
|
|
|
|
**File:** `src/modules/algorithms/computeProductsCacheTTL.ts`
|
|
|
|
Used for: forecast + procurement products.
|
|
|
|
| # | Condition | TTL | Human |
|
|
| --- | ------------------------------------------- | ------------ | ---------- |
|
|
| 1 | Status is Won/Lost/Pending Won/Pending Lost | `null` | No cache |
|
|
| 2 | Main cache TTL is `null` | `null` | No cache |
|
|
| 3 | `lastUpdated` within **3 days** | 15,000 ms | 15 seconds |
|
|
| 4 | Everything else | 1,200,000 ms | 20 minutes |
|
|
|
|
Products on terminal-status opportunities are never proactively cached. Non-hot products use a **lazy on-demand** cache — they're fetched when requested and cached for 20 minutes.
|
|
|
|
### Site TTL
|
|
|
|
Sites use a fixed TTL of **20 minutes** (1,200,000 ms). Site/address data rarely changes. Sites are **not** proactively warmed by the background refresh — they are populated lazily on the first detail-view request.
|
|
|
|
---
|
|
|
|
## Background Refresh
|
|
|
|
**Function:** `refreshOpportunityCache()` in `src/modules/cache/opportunityCache.ts`
|
|
|
|
**Interval:** Every 20 minutes, triggered from `src/index.ts`.
|
|
|
|
### Refresh cycle
|
|
|
|
1. **Query DB** — fetch all non-closed opportunities + recently closed (within 30 days), ordered by `cwLastUpdated DESC` (most recently active first).
|
|
2. **Batch EXISTS check** — use a single Redis pipeline to check which cache keys already exist (5 EXISTS commands per opportunity: oppCwData, activities, notes, contacts, products).
|
|
3. **Build thunk list** — for each opportunity with missing keys, push a **thunk** (lazy function) into the task list. No HTTP requests fire at this point.
|
|
4. **Execute with bounded concurrency** — process thunks in batches of `CONCURRENCY` (currently **6**), with a `BATCH_DELAY_MS` (currently **250ms**) pause between batches. Each thunk is only invoked inside the batch loop.
|
|
5. **Emit events** — `cache:opportunities:refresh:started` and `cache:opportunities:refresh:completed` events are emitted for the event debugger.
|
|
|
|
### Inventory-adjustment listener cycle
|
|
|
|
**Function:** `listenInventoryAdjustments()` in `src/modules/cw-utils/procurement/listenInventoryAdjustments.ts`
|
|
|
|
**Interval:** Every 60 seconds, triggered from `src/index.ts`.
|
|
|
|
1. Fetch `GET /procurement/adjustments?pageSize=1000`.
|
|
2. Build a normalized snapshot of tracked inventory rows (`cwCatalogId`, `onHand`, `inventory`) per adjustment.
|
|
3. Compare to previous snapshot; extract only changed product IDs.
|
|
4. For each changed product ID, fetch fresh CW catalog item + current on-hand.
|
|
5. Upsert `CatalogItem` in Postgres and write Redis key `catalog:item:cw:{cwId}` with a 20-minute TTL.
|
|
|
|
Guardrails to prevent request storms:
|
|
|
|
- Diffing is computed at **product state** level (grouped by `cwCatalogId`), not raw adjustment-row churn.
|
|
- Per-cycle syncs are capped (`CW_ADJUSTMENT_SYNC_MAX_PER_CYCLE`, default `50`).
|
|
- Product resync cooldown is enforced (`CW_ADJUSTMENT_SYNC_COOLDOWN_MS`, default `600000` ms / 10 min).
|
|
|
|
This avoids full-catalog sweeps for small inventory movements and updates only the products implicated by adjustments.
|
|
|
|
### Full procurement catalog refresh
|
|
|
|
**Function:** `refreshCatalog()` in `src/modules/cw-utils/procurement/refreshCatalog.ts`
|
|
|
|
**Interval:** Every 30 minutes, triggered from `src/index.ts`.
|
|
|
|
The full catalog cache/DB sync uses the same slow-parallel thunk strategy as opportunity cache refreshes:
|
|
|
|
- Build arrays of thunk tasks (`() => Promise<void>`) for CW item fetches, inventory fetches, and DB upserts.
|
|
- Execute with bounded concurrency (`CONCURRENCY=6`).
|
|
- Pause between batches (`BATCH_DELAY_MS=250`) to avoid CW burst pressure.
|
|
- Log task failures and retry naturally on the next cycle.
|
|
|
|
This keeps full-catalog refresh conservative while inventory-adjustment listener handles near-real-time targeted updates.
|
|
|
|
### Full inventory sweep fallback
|
|
|
|
`refreshInventory()` remains as a safety net but is intentionally infrequent:
|
|
|
|
- Runs every **6 hours** from `src/index.ts` (no startup-time full sweep).
|
|
- Uses the same slow-parallel pattern (`CONCURRENCY=6`, `BATCH_DELAY_MS=250`) to avoid burst traffic.
|
|
|
|
Most on-hand freshness now comes from the 60-second adjustment listener plus 30-minute full catalog refresh.
|
|
|
|
### Concurrency control
|
|
|
|
The thunk pattern is critical. Previously, tasks were pushed as already-executing promises (`refreshTasks.push(fetchAndCache(...))`), which meant all HTTP requests fired simultaneously regardless of the batching loop. The fix was changing the array type from `Promise<void>[]` to `(() => Promise<void>)[]` so requests only start when explicitly invoked: `batch.map((fn) => fn())`.
|
|
|
|
### Current tuning
|
|
|
|
| Parameter | Value | Effect |
|
|
| ---------------- | ---------- | ------------------------------------------ |
|
|
| `CONCURRENCY` | 6 | Max simultaneous CW API requests per batch |
|
|
| `BATCH_DELAY_MS` | 250 | Milliseconds between batches |
|
|
| Refresh interval | 20 minutes | How often the full sweep runs |
|
|
|
|
At these settings, a full sweep of ~500 expired keys completes in ~1-2 minutes with zero CW errors and ~230ms median latency.
|
|
|
|
---
|
|
|
|
## Retry Logic (`withCwRetry`)
|
|
|
|
**File:** `src/modules/cw-utils/withCwRetry.ts`
|
|
|
|
Wraps CW API calls with exponential backoff retry on transient errors.
|
|
|
|
### Retryable errors
|
|
|
|
- `ECONNABORTED` (timeout)
|
|
- `ECONNRESET`
|
|
- `ETIMEDOUT`
|
|
- `ECONNREFUSED`
|
|
- `ERR_NETWORK`
|
|
- `ENETUNREACH`
|
|
- HTTP 5xx server errors
|
|
|
|
### Default configuration
|
|
|
|
| Parameter | Default | Description |
|
|
| ------------- | ------- | ----------------------------------------------------------- |
|
|
| `maxAttempts` | 3 | Total attempts including the first |
|
|
| `baseDelayMs` | 1,000 | Delay before first retry (doubles each retry: 1s → 2s → 4s) |
|
|
| `label` | — | Optional tag for log messages |
|
|
|
|
### Usage
|
|
|
|
```ts
|
|
import { withCwRetry } from "./withCwRetry";
|
|
|
|
const response = await withCwRetry(
|
|
() => connectWiseApi.get(`/company/companies/${id}`),
|
|
{ label: `fetchCompany#${id}`, maxAttempts: 3, baseDelayMs: 1_500 },
|
|
);
|
|
```
|
|
|
|
Non-transient errors (404, 400, etc.) are re-thrown immediately without retry.
|
|
|
|
---
|
|
|
|
## CW API Logger
|
|
|
|
**File:** `src/modules/cw-utils/cwApiLogger.ts`
|
|
|
|
Axios interceptor that logs every CW API call to a JSONL file. Logging is **opt-in** — set the `LOG_CW_API` environment variable to enable it. Each process start creates a new timestamped file in the `cw-api-logs/` directory (e.g., `cw-api-logs/2026-03-02T14-30-05.123Z.jsonl`).
|
|
|
|
### Enabling logging
|
|
|
|
```bash
|
|
# Via the dev:log shorthand script
|
|
bun run dev:log
|
|
|
|
# Or manually with any command
|
|
LOG_CW_API=1 bun run dev
|
|
```
|
|
|
|
### Log entry fields
|
|
|
|
| Field | Type | Description |
|
|
| ------------ | ----------------- | ----------------------------------- |
|
|
| `timestamp` | string (ISO-8601) | When the request completed |
|
|
| `method` | string | HTTP method |
|
|
| `url` | string | Request URL (relative or absolute) |
|
|
| `baseURL` | string | Axios baseURL |
|
|
| `status` | number \| null | HTTP status (null on network error) |
|
|
| `durationMs` | number | Wall-clock time in milliseconds |
|
|
| `error` | string \| null | Error code + message, if any |
|
|
| `timeout` | number | Configured timeout in ms |
|
|
|
|
### Analysis
|
|
|
|
Run the analyzer script to analyze the most recent log file:
|
|
|
|
```bash
|
|
bun run utils:analyze_cw
|
|
```
|
|
|
|
Or specify a particular file:
|
|
|
|
```bash
|
|
python3 debug-scripts/analyze-cw-calls.py cw-api-logs/2026-03-02T14-30-05.123Z.jsonl
|
|
```
|
|
|
|
This executes `debug-scripts/analyze-cw-calls.py` which produces:
|
|
|
|
- Overview (total calls, error rate, time span)
|
|
- Duration statistics (min, max, mean, p50, p90, p95, p99, distribution histogram)
|
|
- Error breakdown by type and endpoint
|
|
- Top 20 slowest calls
|
|
- Per-endpoint stats (count, errors, mean, p50, p95, max, total time)
|
|
- Timeline (per-minute throughput and errors)
|
|
- Concurrency hotspot detection
|
|
- Summary with recommendations
|
|
|
|
To clear all logs:
|
|
|
|
```bash
|
|
rm -rf cw-api-logs/
|
|
```
|
|
|
|
---
|
|
|
|
## Cache Invalidation
|
|
|
|
Mutation endpoints invalidate the relevant cache keys so the next read fetches fresh data from CW:
|
|
|
|
| Mutation | Cache invalidated |
|
|
| ------------------------------ | ---------------------------------------------------------------- |
|
|
| Create/update/delete note | `opp:notes:{cwOpportunityId}` via `invalidateNotesCache()` |
|
|
| Create/update/delete contact | `opp:contacts:{cwOpportunityId}` via `invalidateContactsCache()` |
|
|
| Add/update/resequence products | `opp:products:{cwOpportunityId}` via `invalidateProductsCache()` |
|
|
| Refresh opportunity | All keys for that opportunity (via re-fetch) |
|
|
|
|
---
|
|
|
|
## ConnectWise API Configuration
|
|
|
|
The shared Axios instance (`connectWiseApi`) is configured in `src/constants.ts`:
|
|
|
|
| Setting | Value | Purpose |
|
|
| --------- | ---------------------------------------------------- | ------------------------------ |
|
|
| `baseURL` | `https://ttscw.totaltech.net/v4_6_release/apis/3.0/` | CW API base |
|
|
| `timeout` | 30,000 ms (30s) | Per-request timeout |
|
|
| Logger | `attachCwApiLogger()` | Writes to `cw-api-calls.jsonl` |
|
|
|
|
---
|
|
|
|
## Architecture diagram
|
|
|
|
```
|
|
src/index.ts
|
|
│
|
|
├─ setInterval(refreshOpportunityCache, 20m)
|
|
│
|
|
└─► src/modules/cache/opportunityCache.ts
|
|
│
|
|
├─ prisma.opportunity.findMany(orderBy: cwLastUpdated DESC)
|
|
├─ redis.pipeline().exists(...) ← batch key check
|
|
│
|
|
├─ Build thunk list (lazy functions)
|
|
│
|
|
└─ Execute thunks with CONCURRENCY=6, DELAY=250ms
|
|
│
|
|
├─► fetchAndCacheOppCwData() ─► opportunityCw.fetch()
|
|
├─► fetchAndCacheActivities() ─► activityCw.fetchByOpportunityDirect()
|
|
├─► fetchAndCacheNotes() ─► opportunityCw.fetchNotes()
|
|
├─► fetchAndCacheContacts() ─► opportunityCw.fetchContacts()
|
|
├─► fetchAndCacheProducts() ─► opportunityCw.fetchProducts() + fetchProcurementProducts()
|
|
├─► fetchAndCacheCompanyCwData() ─► fetchCwCompanyById() + contacts
|
|
└─► fetchAndCacheSite() ─► fetchCompanySite() (lazy only)
|
|
│
|
|
└─► connectWiseApi.get(...) ← withCwRetry + cwApiLogger interceptors
|
|
│
|
|
└─► Redis SET with computed TTL
|
|
```
|
|
|
|
---
|
|
|
|
## File reference
|
|
|
|
| File | Purpose |
|
|
| ---------------------------------------------------------------- | ------------------------------------------------------------- |
|
|
| `src/modules/cache/opportunityCache.ts` | Cache read/write helpers, background refresh logic |
|
|
| `src/modules/algorithms/computeCacheTTL.ts` | Primary adaptive TTL algorithm |
|
|
| `src/modules/algorithms/computeSubResourceCacheTTL.ts` | Sub-resource (notes, contacts) TTL algorithm |
|
|
| `src/modules/algorithms/computeProductsCacheTTL.ts` | Products TTL algorithm |
|
|
| `src/modules/cw-utils/withCwRetry.ts` | Retry wrapper with exponential backoff |
|
|
| `src/modules/cw-utils/cwApiLogger.ts` | Axios interceptor for JSONL call logging |
|
|
| `src/modules/cw-utils/fetchCompany.ts` | Company fetch with retry |
|
|
| `src/modules/cw-utils/procurement/listenInventoryAdjustments.ts` | Adjustment listener for targeted catalog-item cache + DB sync |
|
|
| `src/constants.ts` | CW Axios instance config (timeout, logger) |
|
|
| `src/index.ts` | Refresh interval registration |
|
|
| `debug-scripts/analyze-cw-calls.py` | CW API call analysis script |
|