22 KiB
Caching Architecture
This document describes the caching layer used in the Optima API, covering the Redis-backed opportunity cache, TTL algorithms, background refresh mechanics, retry logic, and debugging tools.
Overview
The API caches expensive ConnectWise (CW) API responses in Redis to reduce latency and avoid CW rate limits. The primary cache layer is the opportunity cache (src/modules/cache/opportunityCache.ts), which is proactively warmed by workers on a background interval.
The API also maintains a Redis-backed sales member metrics cache (src/modules/cache/salesOpportunityMetricsCache.ts) refreshed every 5 minutes. It precomputes per-member dashboard/reporting figures (pipeline revenue, won/lost counts, win rate, avg days to close, and related metrics) for fast reads from /v1/sales/opportunities/metrics.
Key design principles
- Adaptive TTLs — cache durations are computed dynamically based on how "hot" an opportunity is (recently updated = shorter TTL = fresher data).
- Background refresh — a 20-minute interval runs a unified opportunity refresh pass (collector-first full sync + cache warm across active and archived opportunities).
- Bounded concurrency — CW API calls are throttled via thunk-based batching to prevent overwhelming the upstream API.
- Graceful degradation — transient CW errors (timeouts, network failures) are caught, logged, and retried on the next cycle rather than crashing the process.
- Priority ordering — most recently updated opportunities are refreshed first so active deals get fresh data before stale ones.
What is cached
Each opportunity can have up to 7 cached payloads in Redis:
| Cache Key Pattern | Data | Source |
|---|---|---|
opp:cw-data:{cwOpportunityId} |
Raw CW opportunity response | GET /sales/opportunities/:id |
opp:activities:{cwOpportunityId} |
CW activities array | GET /sales/activities?conditions=opportunity/id=:id |
opp:notes:{cwOpportunityId} |
CW notes array | GET /sales/opportunities/:id/notes |
opp:contacts:{cwOpportunityId} |
CW contacts array | GET /sales/opportunities/:id/contacts |
opp:products:{cwOpportunityId} |
Forecast + procurement products blob | GET /sales/opportunities/:id/forecast + GET /procurement/products |
opp:company-cw:{cw_CompanyId} |
Hydrated company + contacts blob | GET /company/companies/:id + contacts endpoints |
opp:site:{cwCompanyId}:{cwSiteId} |
Company site data | GET /company/companies/:id/sites/:siteId |
Inventory-adjustment-driven catalog sync adds a targeted product cache:
| Cache Key Pattern | Data | Source |
|---|---|---|
catalog:item:cw:{cwId} |
Full CW catalog item + computed onHand + DB row snapshot |
GET /procurement/adjustments + GET /procurement/catalog/:id + catalog inventory endpoint |
Sales opportunity metrics caching adds member-focused keys:
| Cache Key Pattern | Data | Source |
|---|---|---|
sales:metrics:members:all |
Envelope of all active-member metrics | Precomputed from active CW members + assigned opportunities + products cache/CW fetch |
sales:metrics:member:{cwIdentifier} |
One member's computed metrics snapshot | Same as above |
sales:metrics:oppRevenue:{cwOppId} |
Per-opportunity computed revenue blob | Metrics refresh lookups (products cache-first, then manager/controller fallback) |
TTL Algorithms
Three algorithms compute cache TTLs. All share the same input signals:
closedFlag— whether the opportunity is closedclosedDate— when it was closedexpectedCloseDate— projected close date (forward-looking signal)lastUpdated— last CW modification date (backward-looking signal)
Primary TTL (computeCacheTTL)
File: src/modules/algorithms/computeCacheTTL.ts
Used for: opportunity CW data, activities, company CW data.
| # | Condition | TTL | Human |
|---|---|---|---|
| 1a | Closed > 30 days ago | null |
Do not cache |
| 1b | Closed within 30 days | 900,000 ms | 15 minutes |
| 2 | expectedCloseDate or lastUpdated within 5 days |
30,000 ms | 30 seconds |
| 3 | expectedCloseDate or lastUpdated within 14 days |
60,000 ms | 60 seconds |
| 4 | Everything else | 900,000 ms | 15 minutes |
Rules are evaluated top-to-bottom; first match wins.
Sub-Resource TTL (computeSubResourceCacheTTL)
File: src/modules/algorithms/computeSubResourceCacheTTL.ts
Used for: notes, contacts.
| # | Condition | TTL | Human |
|---|---|---|---|
| 1a | Closed > 30 days ago | null |
Do not cache |
| 1b | Closed within 30 days | 300,000 ms | 5 minutes |
| 2 | Within 5 days | 60,000 ms | 60 seconds |
| 3 | Within 14 days | 120,000 ms | 2 minutes |
| 4 | Everything else | 300,000 ms | 5 minutes |
Products TTL (computeProductsCacheTTL)
File: src/modules/algorithms/computeProductsCacheTTL.ts
Used for: forecast + procurement products.
| # | Condition | TTL | Human |
|---|---|---|---|
| 1 | Status is Won/Lost/Pending Won/Pending Lost | null |
No cache |
| 2 | Main cache TTL is null |
null |
No cache |
| 3 | lastUpdated within 3 days |
15,000 ms | 15 seconds |
| 4 | Everything else | 1,200,000 ms | 20 minutes |
Products on terminal-status opportunities are never proactively cached. Non-hot products use a lazy on-demand cache — they're fetched when requested and cached for 20 minutes.
Site TTL
Sites use a fixed TTL of 20 minutes (1,200,000 ms). Site/address data rarely changes. Sites are not proactively warmed by the background refresh — they are populated lazily on the first detail-view request.
Background Refresh
Worker path: enqueueActiveOpportunityRefreshJob() in src/index.ts → src/workert.ts → refreshActiveOpportunitiesWorker() in src/modules/workers/cache/refreshActiveOpportunities.ts
Interval: Every 20 minutes, triggered from src/index.ts.
Refresh cycle
- Full sync first — worker runs
refreshOpportunities()with collector-first strategy (fetchOpportunitiescollector call, CW fallback) to keep DB opportunity records current. - Collector cache seeding — when collector data is used, Redis keys for opportunity CW data, activities, notes, contacts, and products are seeded directly from collector payload before cache warm planning.
- Query DB — fetch all opportunities ordered by
cwLastUpdated DESC. - Compute TTL per opportunity — adaptive TTL for active/recent opportunities, and
TTL_ARCHIVED_MS(24h) for archived opportunities where adaptive TTL resolves tonull. - Batch EXISTS check — use a single Redis pipeline to check which cache keys already exist (5 EXISTS commands per opportunity: oppCwData, activities, notes, contacts, products).
- Build thunk list — for each opportunity with missing keys, push a thunk (lazy function) into the task list. No HTTP requests fire at this point.
- Execute with bounded concurrency — process thunks through a worker pool (
ACTIVE_REFRESH_CONCURRENCY, default 12) and progress logging (ACTIVE_REFRESH_PROGRESS_EVERY, default 50). - Emit events —
cache:opportunities:refresh:startedandcache:opportunities:refresh:completedevents are emitted for the event debugger.
Inventory-adjustment listener cycle
Function: listenInventoryAdjustments() in src/modules/cw-utils/procurement/listenInventoryAdjustments.ts
Interval: Every 60 seconds, triggered from src/index.ts.
- Fetch
GET /procurement/adjustments?pageSize=1000. - Build a normalized snapshot of tracked inventory rows (
cwCatalogId,onHand,inventory) per adjustment. - Compare to previous snapshot; extract only changed product IDs.
- For each changed product ID, fetch fresh CW catalog item + current on-hand.
- Upsert
CatalogItemin Postgres and write Redis keycatalog:item:cw:{cwId}with a 20-minute TTL.
Guardrails to prevent request storms:
- Diffing is computed at product state level (grouped by
cwCatalogId), not raw adjustment-row churn. - Per-cycle syncs are capped (
CW_ADJUSTMENT_SYNC_MAX_PER_CYCLE, default50). - Product resync cooldown is enforced (
CW_ADJUSTMENT_SYNC_COOLDOWN_MS, default600000ms / 10 min).
This avoids full-catalog sweeps for small inventory movements and updates only the products implicated by adjustments.
Full procurement catalog refresh
Function: refreshCatalog() in src/modules/cw-utils/procurement/refreshCatalog.ts
Interval: Every 30 minutes, triggered from src/index.ts.
The full catalog cache/DB sync uses the same slow-parallel thunk strategy as opportunity cache refreshes:
- Build arrays of thunk tasks (
() => Promise<void>) for CW item fetches, inventory fetches, and DB upserts. - Execute with bounded concurrency (
CONCURRENCY=6). - Pause between batches (
BATCH_DELAY_MS=250) to avoid CW burst pressure. - Log task failures and retry naturally on the next cycle.
This keeps full-catalog refresh conservative while inventory-adjustment listener handles near-real-time targeted updates.
Full inventory sweep fallback
refreshInventory() remains as a safety net but is intentionally infrequent:
- Runs every 6 hours from
src/index.ts(no startup-time full sweep). - Uses the same slow-parallel pattern (
CONCURRENCY=6,BATCH_DELAY_MS=250) to avoid burst traffic.
Most on-hand freshness now comes from the 60-second adjustment listener plus 30-minute full catalog refresh.
Concurrency control
The thunk pattern is critical. Previously, tasks were pushed as already-executing promises (refreshTasks.push(fetchAndCache(...))), which meant all HTTP requests fired simultaneously regardless of the batching loop. The fix was changing the array type from Promise<void>[] to (() => Promise<void>)[] so requests only start when explicitly invoked: batch.map((fn) => fn()).
Current tuning
| Parameter | Value | Effect |
|---|---|---|
ACTIVE_REFRESH_CONCURRENCY |
12 | Max simultaneous CW API requests |
ACTIVE_REFRESH_PROGRESS_EVERY |
50 | Task completion cadence for progress logs |
| Refresh interval | 20 minutes | How often the unified sweep runs |
At these settings, a full sweep of ~500 expired keys completes in ~1-2 minutes with zero CW errors and ~230ms median latency.
Archived opportunities in unified refresh
Archived opportunities are those returned null by computeCacheTTL — specifically opportunities with closedFlag = true AND closedDate older than 30 days (or closedDate = null).
In the unified 20-minute refresh pass, these rows are no longer skipped. Instead, cache writes use TTL_ARCHIVED_MS = 86,400,000 ms (24 hours) while still using the same missing-key EXISTS strategy as active opportunities.
Sales metrics refresh job
Function: refreshSalesOpportunityMetricsCache() in src/modules/cache/salesOpportunityMetricsCache.ts
Interval: Every 5 minutes, triggered from src/index.ts.
Startup behavior: On app startup, the refresh is invoked once with forceColdLoad=true, which clears metrics-owned Redis keys and bypasses metrics/product cache reuse for that initial rebuild. Subsequent interval runs use the normal warm path.
Refresh flow:
- Fetch all active CW members (
inactiveFlag=false). Source: localCwMembertable (kept in sync by the existing members refresh job). - Query DB opportunities assigned to those members (primary or secondary rep), scoped to open opportunities plus YTD-closed opportunities.
- For each opportunity, compute revenue cache-first from
sales:metrics:oppRevenue:{cwOppId}thenopp:products:{cwOpportunityId}, and fallback through the manager/controller path (opportunities.fetchRecord(...).fetchProducts()) on miss. - Aggregate member metrics (pipeline revenue, won/lost MTD+YTD counts, avg days to close, weighted pipeline, win/loss rates, and related KPIs).
- Write per-opportunity revenue blobs plus all-member and per-member snapshots to Redis with a 10-minute TTL.
Safety controls:
- Single-flight lock prevents overlapping refresh runs if a prior run is still in progress.
- Per-opportunity timeout guard ensures slow CW product lookups degrade to zero-revenue fallback instead of stalling the full refresh.
- Force-cold-load mode clears
sales:metrics:*runtime state owned by the metrics cache before rebuilding startup data.
This cache-first model prioritizes metrics-owned opportunity revenue keys first, then opportunity product cache entries, and only reaches CW when needed.
Retry Logic (withCwRetry)
File: src/modules/cw-utils/withCwRetry.ts
Wraps CW API calls with exponential backoff retry on transient errors.
Retryable errors
ECONNABORTED(timeout)ECONNRESETETIMEDOUTECONNREFUSEDERR_NETWORKENETUNREACH- HTTP 5xx server errors
Default configuration
| Parameter | Default | Description |
|---|---|---|
maxAttempts |
3 | Total attempts including the first |
baseDelayMs |
1,000 | Delay before first retry (doubles each retry: 1s → 2s → 4s) |
label |
— | Optional tag for log messages |
Usage
import { withCwRetry } from "./withCwRetry";
const response = await withCwRetry(
() => connectWiseApi.get(`/company/companies/${id}`),
{ label: `fetchCompany#${id}`, maxAttempts: 3, baseDelayMs: 1_500 },
);
Non-transient errors (404, 400, etc.) are re-thrown immediately without retry.
CW API Logger
File: src/modules/cw-utils/cwApiLogger.ts
Axios interceptor that logs every CW API call to a JSONL file. Logging is opt-in — set the LOG_CW_API environment variable to enable it. Each process start creates a new timestamped file in the cw-api-logs/ directory (e.g., cw-api-logs/2026-03-02T14-30-05.123Z.jsonl).
Enabling logging
# Via the dev:log shorthand script
bun run dev:log
# Or manually with any command
LOG_CW_API=1 bun run dev
Log entry fields
| Field | Type | Description |
|---|---|---|
timestamp |
string (ISO-8601) | When the request completed |
method |
string | HTTP method |
url |
string | Request URL (relative or absolute) |
baseURL |
string | Axios baseURL |
status |
number | null | HTTP status (null on network error) |
durationMs |
number | Wall-clock time in milliseconds |
error |
string | null | Error code + message, if any |
timeout |
number | Configured timeout in ms |
Analysis
Run the analyzer script to analyze the most recent log file:
bun run utils:analyze_cw
Or specify a particular file:
python3 debug-scripts/analyze-cw-calls.py cw-api-logs/2026-03-02T14-30-05.123Z.jsonl
This executes debug-scripts/analyze-cw-calls.py which produces:
- Overview (total calls, error rate, time span)
- Duration statistics (min, max, mean, p50, p90, p95, p99, distribution histogram)
- Error breakdown by type and endpoint
- Top 20 slowest calls
- Per-endpoint stats (count, errors, mean, p50, p95, max, total time)
- Timeline (per-minute throughput and errors)
- Concurrency hotspot detection
- Summary with recommendations
To clear all logs:
rm -rf cw-api-logs/
Cache Invalidation
Mutation endpoints invalidate the relevant cache keys so the next read fetches fresh data from CW:
| Mutation | Cache invalidated |
|---|---|
| Create/update/delete note | opp:notes:{cwOpportunityId} via invalidateNotesCache() |
| Create/update/delete contact | opp:contacts:{cwOpportunityId} via invalidateContactsCache() |
| Add/update/resequence products | opp:products:{cwOpportunityId} via invalidateProductsCache() |
| Refresh opportunity | All keys for that opportunity (via re-fetch) |
ConnectWise API Configuration
The shared Axios instance (connectWiseApi) is configured in src/constants.ts:
| Setting | Value | Purpose |
|---|---|---|
baseURL |
https://ttscw.totaltech.net/v4_6_release/apis/3.0/ |
CW API base |
timeout |
30,000 ms (30s) | Per-request timeout |
| Logger | attachCwApiLogger() |
Writes to cw-api-calls.jsonl |
Architecture diagram
src/index.ts
│
├─ setInterval(enqueueActiveOpportunityRefreshJob, 20m)
│
└─► src/workert.ts (REFRESH_ACTIVE_OPPORTUNITIES worker)
│
├─ refreshOpportunities() // collector-first full DB sync
│
└─► refreshActiveOpportunitiesWorker()
│
├─ prisma.opportunity.findMany(orderBy: cwLastUpdated DESC)
├─ redis.pipeline().exists(...) ← batch key check
│
├─ Build thunk list (lazy functions)
│
└─ Execute thunks with ACTIVE_REFRESH_CONCURRENCY
│
├─► fetchAndCacheOppCwData() ─► opportunityCw.fetch()
├─► fetchAndCacheActivities() ─► activityCw.fetchByOpportunityDirect()
├─► fetchAndCacheNotes() ─► opportunityCw.fetchNotes()
├─► fetchAndCacheContacts() ─► opportunityCw.fetchContacts()
├─► fetchAndCacheProducts() ─► opportunityCw.fetchProducts() + fetchProcurementProducts()
├─► fetchAndCacheCompanyCwData() ─► fetchCwCompanyById() + contacts
└─► fetchAndCacheSite() ─► fetchCompanySite() (lazy only)
│
└─► connectWiseApi.get(...) ← withCwRetry + cwApiLogger interceptors
│
└─► Redis SET with computed TTL
File reference
| File | Purpose |
|---|---|
src/modules/cache/opportunityCache.ts |
Cache read/write helpers and key utilities |
src/modules/workers/cache/refreshActiveOpportunities.ts |
Unified opportunity cache refresh worker (active + archived) |
src/workert.ts |
Queue wiring and collector-first full refresh orchestration |
src/modules/algorithms/computeCacheTTL.ts |
Primary adaptive TTL algorithm |
src/modules/algorithms/computeSubResourceCacheTTL.ts |
Sub-resource (notes, contacts) TTL algorithm |
src/modules/algorithms/computeProductsCacheTTL.ts |
Products TTL algorithm |
src/modules/cw-utils/withCwRetry.ts |
Retry wrapper with exponential backoff |
src/modules/cw-utils/cwApiLogger.ts |
Axios interceptor for JSONL call logging |
src/modules/cw-utils/fetchCompany.ts |
Company fetch with retry |
src/modules/cw-utils/procurement/listenInventoryAdjustments.ts |
Adjustment listener for targeted catalog-item cache + DB sync |
src/modules/cache/salesOpportunityMetricsCache.ts |
5-minute active-member opportunity metrics cache |
src/constants.ts |
CW Axios instance config (timeout, logger) |
src/index.ts |
Refresh interval registration and worker job enqueueing |
debug-scripts/analyze-cw-calls.py |
CW API call analysis script |