ClaimKit

Receipt Forensics

AI-powered document inspection that analyzes PDFs, images, and email invoices for edits, font/style inconsistencies, layered graphics, metadata anomalies, and mismatched barcodes. Suspicious regions are highlighted for quick review with a confidence score. Benefit: stops doctored receipts at intake, reduces manual eyeballing for agents, and strengthens audit defensibility.

Requirements

Document Normalization & OCR Pipeline

"As a claims operations leader, I want all receipt formats normalized and text-extracted so that forensic checks run consistently regardless of source."

Description

Ingest PDFs, images (JPEG, PNG, HEIC), and email invoices (body and attachments), normalize them to a standard representation, and perform high-accuracy OCR with multi-language support. Correct skew, de-noise, detect page boundaries, and preserve layout structure, fonts, and vector elements to enable downstream forensic analysis. Expose a normalized document model (text, layout, font map, raster layers) for detectors, invoke automatically at claim intake, and handle corrupted or encrypted files with explicit error states within SLA.

Acceptance Criteria

Auto-Invocation at Claim Intake

Given a new claim is created from an inbound email with an invoice in the email body or as a supported attachment, When the claim is saved to the intake queue, Then the normalization and OCR pipeline starts automatically within 5 seconds and records a processing session ID. Given a file is uploaded via API or agent portal, When the upload completes, Then the pipeline starts within 5 seconds and the claim shows status "normalizing". Given processing begins, When each stage completes, Then the claim timeline records state transitions: received -> normalizing -> ocr -> normalized -> complete or error with UTC timestamps.

Supported File Types and Email Assembly

Given input files of type PDF, JPEG, PNG, HEIC, and email messages (parsed HTML body plus attachments), When ingested, Then they are accepted without manual conversion and added to a single document stream for the claim. Given an email with both body content and attachments, When normalized, Then the body renders as page 1 followed by attachments in their original order with page indices starting at 1. Given a vector PDF, When normalized, Then selectable text is extracted from vectors before raster OCR and the original page count is preserved.

Skew Correction, De-noise, and Page Boundary Detection

Given images with rotation between -15 and +15 degrees, When normalized, Then residual skew is <= 1.0 degree at P95. Given noisy scans with Gaussian-like noise up to sigma 20 (8-bit), When normalized, Then OCR character error rate degradation is <= 0.5% absolute versus clean baseline at P95. Given images with excess borders or backgrounds, When normalized, Then page boundaries are detected and cropped with < 2% content loss and no truncated text bounding boxes at P95.

High-Accuracy OCR with Multi-Language Support

Given printed receipts in English, Spanish, French, and German, When OCR runs, Then per-page character error rate is <= 1.5% (median) and <= 3.0% (P95). Given pages containing mixed languages among the supported set, When OCR runs, Then primary language is auto-detected correctly on >= 95% of pages. Given numeric strings (0-9) in receipts, When OCR runs, Then digit character error rate is <= 0.5% at P95.

Normalized Document Model with Layout, Fonts, and Raster Layers

Given any supported document, When normalization completes, Then the output includes a model with keys: pages[], pages[].textBlocks[], pages[].layout.readingOrder[], pages[].fontMap[], pages[].rasterLayers[], and a top-level documentId. Given a page, When inspected, Then each textBlock includes text, boundingBox {x,y,w,h in points}, pageIndex, fontId, and confidence; and each rasterLayers[] entry includes imageDpi, colorSpace, bitDepth, and zIndex. Given the reading order, When validated, Then it preserves column-wise top-to-bottom, left-to-right ordering with Kendall tau distance <= 0.1 versus visual order on the validation set.

Explicit Error States for Corrupted or Encrypted Files

Given an encrypted PDF without a provided password, When processed, Then the pipeline sets status=error with code=FILE_ENCRYPTED and a human-readable message within 10 seconds and does not auto-retry. Given a corrupted or malformed file, When processed, Then the pipeline sets status=error with code=FILE_CORRUPTED within 10 seconds and routes the claim to a manual review queue. Given an engine timeout or crash during normalization or OCR, When detected, Then the pipeline sets status=error with a specific code (OCR_TIMEOUT or NORMALIZATION_FAILED), cleans partial artifacts, and limits retries to 1 attempt.

Performance and Throughput Under Load

Given a batch of 1,000 mixed documents (median 1 page; 95th percentile 5 pages) arriving over 30 minutes, When processed, Then 95% complete within 60 seconds of intake and 99% within 180 seconds with zero data loss. Given sustained intake at 10 documents per second, When processed, Then the system maintains average CPU utilization < 80% and applies queue backpressure without rejecting requests for at least 60 minutes. Given a 5-page 300 DPI PDF, When processed individually, Then end-to-end normalization plus OCR completes within 10 seconds at P95.

Visual Tamper Detection Engine

"As a fraud analyst, I want automatic detection of altered receipts so that doctored documents are flagged before agent review."

Description

Analyze documents for signs of manipulation including font/style inconsistencies, copy-move/splice regions, layered graphics, recompression artifacts, and noise/edge irregularities. Combine techniques such as error level analysis, resampling detection, and clone detection to produce region-level annotations with reason codes and confidence scores. Support scans and screenshots across merchant templates, execute asynchronously at intake, and scale horizontally without breaching intake SLAs.

Acceptance Criteria

Detects Copy-Move/Splice Manipulations on Receipt Images

Given a tampered receipt image with known copy-move or spliced regions and a ground-truth mask When the engine processes the document asynchronously at intake Then it returns at least one region annotation where reason_codes includes CLONE_DETECTION or SPLICE_DETECTION and confidence >= 0.80 And the predicted region mask Intersection-over-Union with the ground-truth mask is >= 0.60 And the job transitions through statuses queued -> processing -> completed and persists results to the associated case within 60 seconds end-to-end And the annotated regions are retrievable via the forensics API for the case with overlay coordinates normalized to [0,1]

Flags Font/Style Inconsistencies in PDF Invoices

Given a PDF invoice where the total amount field has been replaced using a different font face/size or kerning When the engine ingests and analyzes the PDF Then it produces a region-level annotation at word or line granularity with reason_codes including FONT_STYLE_MISMATCH and confidence >= 0.75 And evidence includes the detected font attributes (font name, size) for both the suspect and neighboring text And the engine optionally reports RESAMPLING_DETECTED or RECOMPRESSION_ARTIFACTS when applicable And on a clean control PDF from the same merchant template, no annotation exceeds confidence 0.50 for FONT_STYLE_MISMATCH

Reports Layered Graphics and Metadata Anomalies

Given a document image or PDF containing pasted graphical elements and metadata indicating editing after capture When the engine analyzes layers and file metadata Then it emits annotations with reason_codes including LAYERED_GRAPHICS_DETECTED and/or METADATA_INCONSISTENCY with confidence >= 0.70 And evidence includes relevant metadata fields (e.g., Software tag, ModifyDate after CreateDate) And for a set of 50 smartphone camera scans with no edits, zero annotations exceed confidence 0.60 for LAYERED_GRAPHICS_DETECTED or METADATA_INCONSISTENCY

Detects Mismatched Barcodes vs Printed Text

Given a receipt that contains a machine-readable barcode encoding a serial/SKU that differs from the printed serial/SKU in text When the engine processes the document Then it decodes the barcode and OCRs the relevant text and emits annotations with reason_codes including BARCODE_TEXT_MISMATCH and confidence >= 0.80 And both the barcode region and the mismatched text region are annotated with bounding polygons And for receipts where barcode and text match exactly, no BARCODE_TEXT_MISMATCH annotation exceeds confidence 0.50

Outputs Region-Level Annotations with Reason Codes

Given any processed document When the engine completes analysis Then the result payload validates against the defined JSON schema for annotations with fields: id, page, geometry (polygon points normalized), reason_codes[], confidence, methods[], evidence{} And all reason_codes used are members of the controlled vocabulary: [CLONE_DETECTION, SPLICE_DETECTION, ELA_RESIDUALS, RESAMPLING_DETECTED, RECOMPRESSION_ARTIFACTS, NOISE_EDGE_IRREGULARITY, FONT_STYLE_MISMATCH, LAYERED_GRAPHICS_DETECTED, METADATA_INCONSISTENCY, BARCODE_TEXT_MISMATCH, REGION_INPAINTING] And at least 90% of processed documents produce one or more annotations or an explicit "no_findings" result, without schema validation errors

Asynchronous Execution Meets Intake SLA Under Load

Given a sustained intake of 500 documents per hour with bursts up to 1,000/hour and average file size <= 10 MB When the system scales workers from 2 to 10 instances Then p95 end-to-end time from document reception to result persisted is <= 45 seconds and per-document analysis p95 is <= 8 seconds without breaching intake SLAs And throughput increases proportionally (within 15% of linear scaling) with instance count And transient failures are retried with exponential backoff up to 3 attempts and the final error rate is < 0.5%

Handles Scans and Screenshots Across Merchant Templates

Given an evaluation corpus of 20 merchant templates with both scans and screenshots When the engine is evaluated at a decision threshold of 0.70 confidence Then on a clean set of 300 genuine documents, the false positive rate for any high-severity reason_code is <= 5% And on a tampered set of 200 documents covering font edits, splices, and barcode mismatches, the true positive rate is >= 85% and macro F1 >= 0.80 And results are consistent across input types, with no more than 10% relative performance drop between scans and screenshots

Metadata Anomaly Analysis

"As a compliance manager, I want metadata anomalies surfaced so that we can defend decisions and spot synthetic documents."

Description

Extract PDF/XMP/EXIF metadata and email headers, validate creation/modified timestamps, software producers, device models, and time zones against claimed purchase details and merchant norms. Verify SPF/DKIM where applicable and flag anomalies such as forward-dated files, stripped metadata, or tool/version mismatches with human-readable explanations and links to the exact evidence.

Acceptance Criteria

Forward-Dated PDF Timestamp Detected

Given a PDF receipt with claimed_purchase_datetime and extracted metadata fields CreationDate and ModDate When Metadata Anomaly Analysis runs Then if CreationDate > claimed_purchase_datetime by >= 12 hours, create anomaly code "META_TIME_FORWARD" with severity "high" And the anomaly explanation includes both timestamps and the computed difference in hours and minutes And evidence.links contains a resolvable link to the "/Info/CreationDate" field (HTTP 200) And the anomaly record includes source="pdf", field="CreationDate", and confidence >= 0.80

EXIF Device Model Mismatch Flagged

Given an image receipt (JPEG/PNG) with EXIF Make and Model fields present and merchant_norms.device_models_allowed defined When Metadata Anomaly Analysis runs Then if exif.Model not in merchant_norms.device_models_allowed for the merchant, create anomaly code "EXIF_DEVICE_MISMATCH" with severity "medium" And the explanation states actual exif.Make + exif.Model and the expected allowed patterns And evidence.links includes a resolvable link to the "exif.Model" field (HTTP 200) And the anomaly record includes source="image", field="exif.Model", and confidence >= 0.75 And if the file lacks EXIF support or EXIF is absent, no "EXIF_DEVICE_MISMATCH" is created

Email SPF/DKIM Verification and Header Consistency

Given an email invoice with raw headers (Received-SPF, Authentication-Results, DKIM-Signature, From, Return-Path) When SPF and DKIM checks are performed Then spf_status and dkim_status are recorded as one of {PASS, FAIL, NONE, TEMPERROR} And if spf_status=FAIL or dkim_status=FAIL, create anomaly code "EMAIL_AUTH_FAIL" with the failing header line quoted in the explanation And if From domain != DKIM d= domain or From domain != Return-Path domain, create anomaly code "EMAIL_DOMAIN_MISMATCH" And evidence.links include resolvable anchors to the exact header lines (HTTP 200) And each created anomaly has confidence >= 0.80

Missing or Stripped Metadata Handling

Given a file type that supports metadata (PDF or JPEG) and merchant_norms indicate typical presence of core metadata When extraction finds XMP, EXIF, and PDF Info dictionaries missing or empty Then create anomaly code "METADATA_STRIPPED" with an explanation listing which schemas are missing (e.g., "EXIF: none; XMP: none; Info: empty") And evidence.links includes at least one resolvable link per missing schema to the field path or extraction report (HTTP 200) And the anomaly record includes confidence >= 0.70 And for formats without the relevant schema (e.g., PNG without EXIF), do not create "METADATA_STRIPPED"

Time Zone Inconsistency Against Merchant Norms

Given claimed_purchase_timezone, merchant_default_timezones, and extracted timezone offsets from CreationDate and/or Email Date When the absolute offset difference between the metadata timezone and any allowed merchant timezone is > 3 hours Then create anomaly code "TIMEZONE_MISMATCH" with an explanation including the offsets and sources compared And evidence.links include resolvable pointers to the specific fields used (e.g., CreationDate TZ, Email Date) (HTTP 200) And the anomaly record includes confidence >= 0.75

Producer/Tool Version Mismatch Identification

Given a PDF receipt with /Info Producer and Creator fields and merchant_norms.producers_allowed patterns When Producer or Creator does not match allowed patterns or versions Then create anomaly code "PRODUCER_MISMATCH" with an explanation stating actual values and expected patterns And if XMP History or other metadata indicates editing by image editors (e.g., Adobe Photoshop, GIMP), also create anomaly code "EDITING_TOOL_DETECTED" And evidence.links include resolvable pointers to "/Info/Producer", "/Info/Creator", and any xmp:History entries (HTTP 200) And each created anomaly has confidence >= 0.80

Barcode & Serial Cross-Validation

"As a support agent, I want barcodes and serials auto-validated against claims so that mismatches are caught without manual checking."

Description

Detect and decode 1D/2D barcodes (e.g., Code128, EAN, QR) within receipts, reconcile decoded values with visible text and the claim’s serial/PO numbers, and validate formatting and check digits. Cross-check decoded identifiers against ClaimKit’s product and warranty records to confirm eligibility and flag mismatches or tamper indicators with precise, actionable messages.

Acceptance Criteria

Detect 1D/2D Barcodes in Receipts

Given a receipt document (PDF, image, or email) containing one or more barcodes of type Code128, EAN-13/UPC-A, or QR When Receipt Forensics processes the document Then each barcode is detected and its symbology type, decoded value, page number, and bounding box (pixels and normalized) are returned And detection confidence for clean, non-obstructed barcodes is ≥ 0.90 And documents with no barcodes return an empty barcode list (no false positives)

Validate Barcode Format and Check Digits

Given a decoded barcode value and its symbology When barcode validation runs Then EAN-13 and UPC-A values must have correct check digits; otherwise an issue is recorded with issue_code='invalid_check_digit' including expected and found digits And values intended to represent serial or PO must match configured format patterns; otherwise an issue is recorded with issue_code='unexpected_format' including the offending value and expected pattern name

Reconcile Barcode Values with Visible Text

Given OCR-extracted visible text from the receipt including serial and/or PO candidates When a barcode decodes to a serial or PO identifier Then a normalized comparison (case-insensitive; whitespace, dashes, and separators ignored) is performed against the nearest text candidate on the same page And on match, the barcode is marked reconciled with references to both regions And on mismatch, an issue is recorded with issue_code='text_barcode_mismatch' including both values and their coordinates

Cross-Validate Decoded Identifiers Against Claim Data

Given an incoming claim with serial and/or PO numbers When cross-validation runs Then decoded barcode serial and PO values must match the claim's corresponding fields after normalization And mismatches are flagged with issue_code='claim_mismatch' specifying field name(s), decoded value(s), and claim value(s) And if the claim lacks a corresponding field, the result includes issue_code='no_claim_value' without failing overall processing

Verify Product and Warranty Eligibility

Given decoded serial number, PO number, and purchase date extracted from the receipt (if available) When querying ClaimKit product and warranty records Then the serial must exist and be associated with the SKU on the receipt; otherwise issue_code='serial_not_found' or 'serial_sku_mismatch' is recorded And warranty eligibility is computed relative to claim creation date per policy; if not eligible, issue_code='out_of_warranty' is returned with policy name, start date, end date, and days out of warranty And if eligible, eligibility=true is returned with policy details

Handle Multiple and Conflicting Barcodes

Given a receipt containing multiple barcodes When decoding and classification run Then each decoded value is labeled with identifier_type ∈ {serial, po, sku, unknown} using configured regexes and proximity to nearby text labels And if multiple barcodes map to the same identifier_type with differing values, the system selects a primary based on highest detection confidence and nearest labeled text, and records issue_code='conflicting_barcodes' listing competing values and selection rationale And the selected value is propagated to downstream comparisons while alternates are retained for audit

Emit Actionable Flags and Highlights

Given any validation failure or mismatch during cross-validation When the result is generated Then the system returns a structured list of issues with fields {issue_code, severity ∈ {info, warning, error}, message, related_regions[], related_fields[], confidence} And each issue includes at least one related_region bounding box for the involved barcode and/or text to support UI highlighting And messages are specific and actionable, naming exact fields and values and suggesting next steps (e.g., verify with customer or route to fraud review)

Suspicious Region Highlighting UI

"As an agent, I want highlighted regions and reasons in the case viewer so that I can make fast, confident decisions."

Description

Render visual overlays in the ClaimKit case viewer to highlight suspicious regions with tooltips showing anomaly type, evidence, and confidence. Provide zoom/pan, layer toggling, and quick actions to approve, escalate, or request resubmission, while preserving original document fidelity and supporting accessibility and performance requirements for large receipts.

Acceptance Criteria

Tooltip Details on Suspicious Regions

Given a case viewer displaying a document with at least one suspicious region, When the user hovers the pointer over a region or focuses it via keyboard, Then a tooltip appears within 150ms and displays anomaly type, evidence summary (<= 140 characters), and confidence percentage to 1 decimal (e.g., 87.3%), And the tooltip is positioned adjacent to the region without covering more than 25% of the region; if overlap would exceed 25%, it auto-repositions, And overlapping regions can be cycled via Tab/Shift+Tab with an "n of m" indicator in the tooltip, And the tooltip dismisses on pointer leave, Escape, or focus loss within 100ms, And all tooltip text and numbers render in the user's locale.

Zoom and Pan Interaction on Large Receipts

Given a document up to 12,000 x 12,000 pixels or a PDF page up to A3 at 600 DPI, When the user zooms between 25% and 400% using controls (+/- buttons), mouse wheel + Ctrl/Cmd, or pinch, Then the underlying document and all suspicious region overlays remain aligned with error <= 1 CSS pixel at any zoom level, And panning with mouse drag, trackpad, or arrow keys maintains alignment with no visible tearing or jitter, And input latency is < 50ms for pan and < 100ms for initial zoom, And Reset (0 key) returns to fit-to-width within 100ms.

Overlay Layer Toggling Without Altering Original

Given overlays are visible, When the user toggles "Show Overlays" off, Then the original document is displayed with no overlays and image pixels remain unmodified, And when the user clicks "Download Original," the file downloaded matches the stored source checksum, And when the user toggles specific anomaly types on/off or adjusts overlay opacity (20%–80%), Then only the selected types render and opacity changes apply within 100ms.

Quick Actions from Region Selection

Given at least one suspicious region is present, When the user selects a region, Then quick actions Approve, Escalate, and Request Resubmission are visible and enabled if the user has permission, And invoking any action opens a confirmation with optional note, And on confirm, the system updates the case (status and region disposition) and writes an audit log entry with user id, timestamp, action, region id, and note, And success feedback appears within 500ms; on error, a non-blocking error message is shown and no state changes persist, And multiple selected regions can be acted on in one operation and all affected regions reflect the same disposition.

Accessibility and Keyboard Navigation

Given the case viewer is used without a mouse, When the user navigates via keyboard and screen reader, Then all suspicious regions are reachable in a logical order, have visible focus, and expose accessible names including anomaly type and confidence, And tooltips are announced on focus and are dismissible with Escape, And overlay and tooltip colors meet WCAG 2.2 AA contrast (>= 4.5:1) including in high-contrast mode, And keyboard shortcuts exist for Zoom In (+), Zoom Out (-), Reset (0), Next/Previous region (Tab/Shift+Tab), and Quick Action (Enter/Space), And testing confirms operability on NVDA (Windows), JAWS (Windows), and VoiceOver (macOS) with no critical blockers.

Performance on High-Resolution Multi-Page Documents

Given a 30-page PDF totaling up to 100MB with overlays on 50% of pages, When the user opens the case, Then the first visible page renders within 2,000ms and its overlays within 300ms after the page appears, And subsequent pages render within 1,000ms as the user navigates, And scrolling uses lazy loading and maintains >= 30 FPS on a reference device (4-core CPU, 8GB RAM), And peak memory remains <= 400MB during navigation.

Multi-Format Document Support and Fidelity

Given inputs in PDF (vector or raster), JPEG, PNG, TIFF (multi-page), and HTML email invoice, When the document is displayed in the viewer, Then orientation from EXIF/PDF is honored, dimensions are correct, and text in vector PDFs remains crisp at 200% zoom, And transparent backgrounds in PNGs/TIFFs are preserved, And multi-page formats expose page navigation and page-specific overlays, And if a format is unsupported, a clear error message is shown and the original file is available for download.

Confidence Scoring & Auto-Routing

"As an operations manager, I want configurable thresholds that auto-route suspicious receipts so that we reduce manual workload while staying within SLAs."

Description

Aggregate detector outputs into a single document confidence score using weighted rules and thresholds configurable per tenant and workflow. Drive routing actions: auto-approve low risk, queue medium risk for review, and auto-hold high risk, updating SLA timers, queues, and notifications accordingly with full auditability of the decision logic.

Acceptance Criteria

Aggregate Detectors into Single Confidence Score

Given a tenant and workflow with configured detector weights and normalization rules When the system receives detector outputs for a document Then it computes a single confidence score in the range 0.0–1.0 using the configured weights And missing detector outputs default to a neutral value per configuration without causing errors And the same inputs and configuration produce the same score deterministically on reprocessing And the computed score is stored with precision to 4 decimal places

Threshold-Based Auto-Routing Actions

Rule: For a tenant/workflow with thresholds Low=L and High=H where 0.0<=L<H<=1.0 - If score < L then route action = Auto-Approve - If L <= score < H then route action = Queue for Review - If score >= H then route action = Auto-Hold And only one route action is applied per document And boundary values L and H are included in the higher risk bracket as specified above And the resulting case status and destination queue match the route action

SLA Timer Updates by Route

Given route action = Auto-Approve When the case is created Then Review SLA timer does not start and Fulfillment SLA timer starts at decision time Given route action = Queue for Review When the case is created Then Review SLA timer starts at decision time and Fulfillment SLA timer does not start Given route action = Auto-Hold When the case is created Then Review SLA timer pauses/stops and Investigation SLA timer starts at decision time

Per-Tenant and Per-Workflow Configurability

Given Tenant A Workflow X and Tenant B Workflow Y have distinct weight and threshold configs When documents are processed for each Then each document uses only its tenant+workflow configuration And changing Tenant A Workflow X thresholds updates routing for new decisions in that scope only And Tenant B cannot view or edit Tenant A configurations

Auditability of Scoring and Routing Decision

Given a routing decision is made When viewing the case audit log Then the log contains timestamp, actor=system, config version, raw detector outputs, normalized values, weights, computed score, thresholds (L,H), selected route, and evaluation rationale And the audit record is immutable and exportable as JSON And replays using the logged snapshot reproduce the same score and route

Notifications by Routing Outcome

Given routing thresholds are configured with notification recipients per outcome When a document is Auto-Approved Then recipients receive a notification including document id, score, action, queue, and link to audit within 60 seconds When a document is Queued for Review Then the assigned review team is notified with SLA due time and priority within 60 seconds When a document is Auto-Held Then fraud/investigation recipients are notified with required next steps within 60 seconds

Failure and Fallback Handling

Given the scoring service or required detector outputs are unavailable beyond a configured timeout When a document is ingested Then the system routes the case to Manual Review Intake queue with route outcome=Fallback And starts the Review SLA timer And sends a failure notification to the tenant’s ops contacts within 60 seconds And the audit log records error details and retry attempts without double-routing the same case

Audit Log & Evidence Export

"As a risk and compliance lead, I want a defensible evidence package for each decision so that audits and disputes can be handled swiftly."

Description

Persist immutable forensic results, inputs, model/rule versions, timestamps, and agent actions. Provide UI and API to export an evidence package that includes highlighted regions, decoded metadata, detector rationales, and a decision summary, conforming to retention and PII redaction policies to strengthen audit and dispute defensibility.

Acceptance Criteria

Immutable Forensic Record on Intake

Given a new claim document is ingested via email or PDF upload When Receipt Forensics completes analysis Then the system writes an append-only audit record containing case ID, source channel, normalized inputs (file hash, file type, size), detector outputs, model and rule version IDs, start and end timestamps, and processing node ID And the record is assigned a content hash and chained to the prior audit record for that case And any update creates a new appended version while preserving the previous record unchanged And the audit record is queryable within 3 seconds of analysis completion

UI Evidence Package Export with PII Redaction

Given a user with Evidence Export permission views a case with forensic results When the user clicks Export Evidence and selects External (PII-Redacted) Then a downloadable package is generated within 10 seconds containing decision summary (PDF), highlighted-region overlays, decoded metadata report (JSON), detector rationales, and audit trail excerpt And PII fields configured in tenant policy (names, emails, phones, full addresses, payment PAN except BIN and last4) are irreversibly redacted or masked And the export header includes retention policy reference, generation timestamp, and export requestor ID And the UI displays a success notification and records an Export action in the audit log

API Evidence Export with Auth and Expiring Link

Given a service account with scope evidence.export and a valid case ID When it calls POST /v1/evidence-exports with payload { caseId, profile: "external" } Then the API returns 202 with an export job ID And within 60 seconds, GET /v1/evidence-exports/{jobId} returns 200 with status=ready and a pre-signed URL that expires in 15 minutes And downloading the URL yields artifacts equivalent to the UI export and matches the manifest checksum And the export action is recorded in the audit log with requester client ID and IP address

Retention and Deletion Policy Enforcement

Given tenant retention is set to 24 months and PII purge after 12 months When a case ages past 12 months but before 24 months Then evidence exports omit or mask PII per policy while preserving forensic signals and hashes And when a case ages past 24 months Then audit logs and forensic artifacts are cryptographically tombstoned and manifests persist only as hash stubs that cannot be exported And attempts to export after retention expiry return 410 Gone with policy code RETENTION_EXPIRED and are logged

Agent Action Traceability and Non-Repudiation

Given an agent reviews a forensic result When the agent marks the document as Fraudulent and adds a reason note Then an audit entry records user ID, role, UTC timestamp, action type, reason, previous state, new state, and case version And the entry includes the agent SSO session ID and request fingerprint And the evidence export includes a human-readable summary of these actions and a machine-readable JSON trail

Integrity Manifest and Signature Verification

Given an evidence package is generated When a validator computes SHA-256 checksums of each file in the package Then they match the checksums listed in manifest.json And the package includes a detached signature (Ed25519) and public key fingerprint And verifying the signature succeeds; otherwise the export is marked invalid and blocked from download And re-exporting the same case and version produces identical manifest hashes

Overlay Fidelity for Highlighted Suspicious Regions

Given a document with N suspicious regions highlighted in the UI When an evidence package is exported Then the package contains overlay files whose region count equals N And each region's coordinates and page indices in the export match the audit record within ±1 pixel at 300 DPI And rendering the overlay on the exported PDF aligns boxes to the same positions as in the UI And if pages are rotated or scaled in export, transforms are applied so alignment error remains ≤ 2 pixels

Serial Graph

Network-level detection that links serial appearances across customers, channels, stores, and time to spot reuse, velocity spikes, and geographic impossibilities. It auto-flags rings and repeat offenders, blocks duplicate submissions, and suggests merges when claims are legitimate duplicates. Benefit: prevents serial laundering and shields SLAs from abuse without extra queue monitoring.

Requirements

Serial Normalization & Fingerprinting

"As an operations analyst, I want serials standardized and fingerprinted across sources so that the system can reliably link and compare claims without false matches."

Description

Implement a normalization pipeline that standardizes serial numbers from all intake channels (email/PDF parsing, API, UI) into a canonical format and creates a robust fingerprint for matching. Handle OEM-specific patterns, check digits, common OCR/typing errors, whitespace and delimiter variance, and international character sets. Produce confidence scores and reason codes for each normalization/match to support explainability and review. Persist mappings from raw input to canonical form for auditability. Expose streaming and batch modes, integrate directly after ClaimKit’s magic inbox, and ensure near–real-time throughput to feed downstream graph/linking with minimal latency.

Acceptance Criteria

Normalize OCR-Extracted Serials From Email/PDF Intake

Given emails/PDFs parsed by the magic inbox containing serials with whitespace, delimiters (- / .), and common OCR confusables (0/O, 1/I/l, 5/S) When normalization runs Then the system outputs a canonical serial per OEM rules with confidence >= 0.90 for unambiguous cases And delimiter/whitespace variance and single-character confusables are corrected And reason codes from the approved catalog are emitted for each transformation step And on a labeled test set of 10,000 samples, accuracy mapping to the expected canonical serial is >= 95% And for ambiguous cases, multiple candidates with confidences are returned and the selected canonical has confidence < 0.90 with reason code AMBIGUOUS_CANDIDATES

Validate OEM-Specific Patterns and Check Digits

Given OEM rule definitions including patterns and check-digit algorithms When a serial is ingested via API, UI, or email/PDF Then the serial is validated against the OEM rules and any check digit is computed and verified And invalid check digits are flagged with reason code INVALID_CHECK_DIGIT and confidence 0.0 And partial/masked serials are tagged PARTIAL_SERIAL with missing segments identified and no fingerprint generated And valid serials include reason codes OEM_RULE_APPLIED and VALIDATED_CHECK_DIGIT where applicable And on a conformance suite of at least 5,000 cases per OEM, validation precision and recall are each >= 99.5%

Handle International Character Sets and Lookalike Characters

Given inputs containing Unicode characters, accents, and script lookalikes (e.g., Cyrillic vs Latin) When normalization runs Then Unicode normalization and confusable-character mapping are applied to the OEM’s canonical character set And the exact raw input is preserved and a reversible mapping of character changes is recorded And reason codes such as TRANSLITERATED_UNICODE and HOMOGLYPH_SUBSTITUTION are emitted when applied And all visually equivalent forms produce identical canonical serials and fingerprints And audit verifies 100% of changed characters and positions are recorded

Generate Fingerprints With Confidence and Reason Codes

Given any canonical serial produced by the pipeline When a fingerprint is generated Then the fingerprint is deterministic and identical across channels and repeated runs And no collisions are observed across a corpus of 1,000,000 distinct canonical serials And serial variants that normalize to the same canonical produce the same fingerprint And each result includes a confidence score (0.00–1.00) and at least one reason code explaining the normalization/match And the reason code list is versioned and validated; unknown codes are rejected by tests

Persist Raw-to-Canonical Mapping and Audit Trail

Given any ingested serial When normalization completes Then the system persists raw input, canonical serial, fingerprint, channel/source identifiers, normalization steps, reason codes, confidence, timestamp, and rule version And mappings are immutable and queryable via an audit API by source ID or fingerprint with P95 latency <= 200 ms and availability >= 99.9% And re-normalization after rule updates creates a new versioned mapping linked to the prior version; previous records are retained And audit export produces a complete CSV/JSON with row-level provenance suitable for external review

Streaming Mode Throughput and Latency After Magic Inbox

Given the streaming pipeline is enabled immediately after the magic inbox When serials are parsed at a sustained rate of 25 serials/sec with bursts up to 100 serials/sec for 5 minutes Then P95 latency from parse-complete to fingerprint-emitted is <= 750 ms and P99 <= 1500 ms And delivery to the downstream Serial Graph topic/queue is at-least-once with idempotent keys (fingerprint) and zero data loss And health metrics and alerts exist for latency, error rate, and backlog, with error rate <= 0.5% including retries

Batch Mode Backfill Processing and Idempotency

Given a historical dataset of 500,000 serials When batch normalization runs Then throughput is >= 3,000 serials/min on the standard batch worker configuration And the job supports checkpointing and resume without creating duplicate mappings (idempotent on raw input ID) And a completion report includes counts by outcome (normalized, partial, invalid), top reason codes, and error samples And re-running the same batch produces identical canonical outputs and fingerprints And P95 memory and CPU utilization remain below 75% of allocated resources

Graph Construction & Entity Resolution Engine

"As a fraud lead, I want a real-time serial graph that connects claims, customers, channels, and locations so that I can see relationships and act on patterns quickly."

Description

Build a multi-tenant serial graph that links normalized serial fingerprints to related entities: claims, customers, devices, orders, stores, channels, addresses, emails/phones, IPs, and geolocations. Support real-time upserts with <2s p95 latency, edge timestamps, and source provenance for each relationship. Provide deterministic and probabilistic entity resolution to consolidate duplicate entities while preserving history (SCD). Enforce tenant isolation with optional controlled sharing for trusted partners. Offer query interfaces and internal APIs to fetch a serial’s neighborhood and history for UI and policy evaluation. Ensure HA, durability, and backfill jobs to incorporate historical data.

Acceptance Criteria

Real-Time Upsert Performance and Read-After-Write Visibility

Given a workload of 5 tenants each sending 200 upserts per minute with valid serial fingerprints and related entities When processing sustained traffic for 30 minutes Then end-to-end p95 upsert latency is ≤ 2 seconds and p99 is ≤ 3.5 seconds, with success rate ≥ 99.9% And when querying the same serial’s neighborhood immediately after a successful upsert Then the new nodes/edges are readable within 2 seconds (read-after-write visibility)

Edge Timestamps and Source Provenance Enforcement

Given creation of any relationship (edge) between entities When the edge is persisted Then edge fields id, type, createdAt, observedAt, sourceSystem, ingestionMethod, sourceId are non-null and stored with millisecond precision And if any required field is missing or invalid Then the write is rejected with HTTP 400 and a descriptive error code And when an edge is updated Then a new SCD version is created with validFrom/validTo, preserving the prior version

Deterministic Entity Resolution With SCD Preservation

Given two customer entities in the same tenant that exactly match on normalized email and normalized phone per rule-set v1 When deterministic resolution runs Then the records are merged into a single golden entity with a stable entityId And prior records are retained as SCD Type 2 with validTo populated and provenance of the merge recorded (ruleId, timestamp, actor) And all inbound/outbound edges are re-pointed to the golden entity without loss of timestamps or source metadata And given the same match across different tenants Then no merge occurs and an audit event states cross-tenant merge prevented

Probabilistic Entity Resolution Thresholds and Safeguards

Given two device entities with a computed match score ≥ 0.92 and no hard conflicts (e.g., tenantId mismatch, mutually exclusive attributes) When probabilistic resolution runs Then the entities auto-merge and the merge record stores score, modelVersion, features used And given a score between 0.80 and 0.91 inclusive Then no auto-merge occurs and the pair is emitted to a review queue with rationale And given labeled validation data of at least 1,000 matched pairs per entity type When evaluating the model offline weekly Then precision ≥ 99.5% and recall ≥ 95.0% are met; otherwise auto-merge is disabled and an alert is raised

Tenant Isolation and Controlled Partner Sharing

Given two tenants A and B with no sharing agreements When querying the serial graph from tenant A for a serial that also exists in tenant B Then zero nodes/edges from tenant B are returned or inferable And given a sharing policy allowing A<->B to share only edge types [serial→device, serial→claim] with field whitelist [edge.type, observedAt, sourceSystem] When the same query is executed Then only the allowed edge types and fields are returned, with all other fields redacted And attempts to write across-tenant merges or edges without an explicit policy Then are rejected with HTTP 403 and audited

Neighborhood and History Query API Contract and SLO

Given GET /graph/serial/{fingerprint}?depth=2&from=2024-01-01&to=2025-01-01&entityTypes=device,claim&pageSize=500 When the serial exists and the neighborhood contains ≤ 1,000 edges Then the response includes nodes and edges with type, ids, timestamps (createdAt, observedAt), and provenance fields, is paginated with a stable cursor, and is ordered deterministically by observedAt desc And p95 response time ≤ 1.5 seconds and p99 ≤ 3.0 seconds And given an invalid fingerprint or out-of-range date Then the API returns HTTP 400 with error codes FK_INVALID or DATE_RANGE_INVALID

HA, Durability, and Backfill Without Operational Regression

Given a single node failure during steady-state traffic (200 upserts/min/tenant, 5 tenants) When failover occurs Then RTO ≤ 60 seconds, write success rate during the event ≥ 99.5%, and no acknowledged writes are lost (RPO = 0) And given a scheduled historical backfill of 10 million records When the job runs concurrently with live traffic Then p95 live upsert latency does not exceed baseline by more than +500 ms and duplicate records do not create duplicate edges (idempotency guaranteed) And the backfill is resumable with checkpoints and exposes progress ≥ 95% accurate via metrics

Velocity & Geo-Improbability Detection

"As a risk analyst, I want automatic detection of serial reuse velocity and geographic impossibilities so that abusive behavior is flagged before it impacts SLAs."

Description

Implement detectors that compute reuse velocity per serial across rolling windows (e.g., 24h/7d/30d) and identify geographic impossibilities by estimating travel speed between consecutive claim locations or service events. Incorporate store coordinates, shipping addresses, and timestamps; respect configurable policy windows and allow OEM-specific thresholds. Generate explainable alerts with evidence (counts, last-seen locations, required speed) and suppress known legitimate scenarios (authorized multi-touch repairs, warranty transfers) via rules and allowlists. Emit risk scores and tags consumed by intake policies and the reviewer console.

Acceptance Criteria

Network-Level Aggregation Across Customers and Channels

Given a serial S has claims/events across multiple customers, channels, and stores within the tenant When computing reuse velocity and geo sequences Then the system aggregates all non-suppressed events for S across customers, channels, and stores into a single ordered timeline by normalized timestamp And duplicate events with identical source_id and payload received within 60 seconds are deduplicated And voided or cancelled claims are excluded from velocity and geo calculations

24h/7d/30d Velocity Flagging for Serial Reuse

Given configured rolling windows {24h, 7d, 30d} and OEM-specific thresholds per window And a new claim for serial S is ingested at time t When the counts of S’s events within each rolling window ending at t are computed Then for each window where count > threshold, a velocity alert is created with tag "velocity_breach_<window>" And the alert includes window length, count, threshold, and contributing claim/event IDs And evaluations are idempotent: reprocessing the same input does not create duplicate alerts

Geo-Improbable Travel Speed Detection

Given two consecutive events E1 and E2 for serial S with resolvable coordinates and timestamps When the implied average speed between E1 and E2 (distance/time_delta) exceeds the configured maximum_speed for the applicable policy/OEM Then a geo_improbable alert is created with calculated distance, time_delta, required_speed, coordinates, location labels, and threshold used And if either event lacks resolvable coordinates or timestamps, the geo check is skipped and no geo_improbable alert is created

Configurable Policy Windows and OEM Threshold Overrides

Given default velocity windows and thresholds exist and OEM-specific overrides may be configured When evaluating a claim for OEM X Then the system uses OEM X’s configured windows and thresholds if present; otherwise it uses the defaults And updates to windows/thresholds are applied to all new evaluations without requiring a code deploy And the configuration version used is recorded on each alert for traceability

Explainable Alert Payload for Detected Anomalies

Given a velocity or geo_improbable detection is triggered When the alert is emitted Then the payload includes: serial_id, anomaly_type, evaluation_time, policy_id, config_version, and evidence containing counts per window (for velocity), contributing_event_ids, last_seen_locations with timestamps, distance and required_speed (for geo), and thresholds used And the alert is visible in the Reviewer Console and retrievable via API with a documented, consistent schema And all evidence IDs resolve to existing entities in the system

Legitimate Scenario Suppression via Rules and Allowlists

Given rules exist for authorized multi-touch repairs and warranty transfers, and allowlists for serials/customers/partners When an event sequence matches a suppression rule or an allowlist entry within its active period Then velocity and geo_improbable alerts are not created for the matched sequence And the evaluation records a suppression log entry with rule_id or allowlist_entry_id, reason_code, and scope And suppression only prevents new alerts; it does not retroactively delete previously created alerts

Risk Scores and Tags Emission to Intake and Review Surfaces

Given an anomaly is detected for a claim/case (velocity or geo_improbable) When the claim is evaluated by intake policies Then a numeric risk_score and tags array including the anomaly tags (e.g., "velocity_breach_24h", "geo_improbable") are attached to the claim context And intake policies can reference risk_score and tags to auto-route, require review, or block submission per configuration And the Reviewer Console displays the risk badge and tags and updates them upon re-evaluation when new events arrive

Real-time Duplicate Submission Blocker

"As a support agent, I want duplicate submissions with the same serial to be blocked at intake so that I don’t waste time triaging redundant tickets."

Description

At claim creation, perform a low-latency lookup on the serial fingerprint to detect open or recently closed claims within configurable policy windows. Return a blocking decision (block/soft-warn/allow) with user-facing rationale and support overrides based on role or allowlists. Provide idempotency to prevent duplicate cases from the same message or API call. Fail safely with graceful degradation to warnings if the detector is unavailable. Log all decisions with correlation IDs for traceability, and ensure that blocked duplicates do not start or impact SLA timers.

Acceptance Criteria

Block Open-Duplicate Within Policy Window

Given a new claim is submitted with serial fingerprint S And an existing open claim with fingerprint S exists within the merchant’s openWindow policy When the duplicate detection lookup runs Then decision.type = "block" and decision.reasonCode = "DUPLICATE_OPEN" And a user-facing rationale message is returned with the matched claim reference And the claim is not created and no SLA timers are started or modified And p95 decision latency <= 150 ms and p99 <= 300 ms And a structured audit log is written with correlationId, requestId, serialHash, decision, reasonCode, matchedClaimIds, policyVersion, and durationMs

Soft-Warn on Recently Closed Duplicate

Given a new claim is submitted with serial fingerprint S And a claim with fingerprint S was closed within the merchant’s closedWindow policy When the duplicate detection lookup runs Then decision.type = "soft-warn" and decision.reasonCode = "DUPLICATE_RECENT_CLOSED" And the client displays a warning with a link to the prior claim and remaining window And the new claim is created and SLA timers start normally And a structured audit log is written with correlationId, matchedClaimIds, policyVersion, and decision details

Role- or Allowlist-Based Override of Block Decision

Given the detector returns decision.type = "block" for serial fingerprint S And the acting user has an override-permitted role OR S is present on the merchant’s allowlist When the user provides a mandatory override justification of at least 10 characters and confirms Then the system creates the claim with decision.type = "allow" and override=true And override metadata (userId, role, justification, timestamp, priorDecision, allowlistSource) is recorded And SLA timers start from the created claim’s timestamp And an audit log and metrics event (override_count) are emitted with correlationId

Idempotent Submission From Same Message or API Call

Given two or more submissions reference the same idempotency key or messageId within 24 hours Or their normalized payload hash matches according to the idempotency strategy When the submissions are processed (including concurrent requests) Then only one claim record exists and subsequent responses return HTTP 200 with the original claimId And no duplicate SLA timers are created And audit logs indicate idempotencyHit=true with correlationId and original claimId

Graceful Degradation When Detector Is Unavailable

Given the duplicate detector times out (>=120 ms) or returns a 5xx error When a new claim is submitted Then decision.type = "warn" and decision.reasonCode = "DETECTOR_UNAVAILABLE" And the claim is created and SLA timers start normally And no blocking occurs due to detector failure And an error log with correlationId is recorded and a detector_unavailable counter metric is incremented

Comprehensive Decision Logging and Traceability

Given any decision path (block, soft-warn, allow, override) When the decision is produced Then a structured, PII-scrubbed log entry is written containing correlationId, requestId, serialHash, decision.type, reasonCode, matchedClaimIds (if any), policyVersion, detectorLatencyMs, and actor (if applicable) And logs are queryable by correlationId within 60 seconds of the event And decision logs are retained for at least 365 days per data policy

Configurable Policy Windows Applied per Merchant

Given a merchant sets openWindowDays and closedWindowDays in policy settings When the detector evaluates a submission Then the policy values in effect are those of the merchant (falling back to platform defaults if unset) And changes to policy take effect within 5 minutes of update And window boundaries are inclusive (<= openWindowDays for open claims, <= closedWindowDays for recently closed) And the applied policyVersion is attached to the decision and audit log

Ring Detection & Repeat Offender Profiling

"As a fraud investigator, I want the system to identify rings and repeat offenders so that I can prioritize investigations and reduce loss."

Description

Detect coordinated fraud by clustering entities (customers, addresses, emails/phones, payment and IP signals) linked via shared or sequential serial appearances across channels and stores. Maintain rolling risk profiles for entities and groups with trend indicators, and auto-flag suspected rings with severity levels and features that explain why they were flagged. Provide configurable thresholds and feedback loops to learn from reviewer outcomes, reducing false positives over time. Surface outputs as tags and risk scores to intake rules and the reviewer console.

Acceptance Criteria

Real-time ring clustering on claim intake

Given a new claim is ingested via any channel When the claim contains a serial and at least one entity signal (email, phone, address, payment fingerprint, or IP) Then the system must create or update the associated entity graph cluster within 5 seconds at P95 and 10 seconds at P99 And the cluster must include edges for shared serial appearances within a rolling 180-day window and sequential reuse by any linked entity And the claim record must be tagged with the resolved cluster (ring_id) and cluster size at time of ingestion

Cross-channel serial linkage and anomaly detection

Given serial S appears in claims across multiple customers and channels When S is used in >3 claims within 7 days across ≥2 distinct channels or stores Then mark a velocity_spike=true feature with count and window And when two claims for S occur within 24 hours and the reported locations are >500 miles apart Then mark geographic_impossibility=true with calculated distance and timestamps And both features must be persisted to the ring feature store and attached to all impacted claims

Risk scoring and severity tiering

Given any entity or cluster has updated features When the risk engine computes a risk score Then produce a score in [0,100] with an explanation-ready feature contribution vector And map the score to severity tiers with default thresholds: Low 0–39, Medium 40–69, High 70–84, Critical 85–100 And allow tenant-level configuration of thresholds with validation (monotonic tiers, numerical ranges) and effective-dated versions And re-score must occur within 60 seconds of any new evidence attach event

Auto-flag actions and reviewer surfacing

Given a cluster’s severity is High or Critical When severity transitions across the High threshold Then create a suspected_ring flag with ring_id, severity, score, and top features And surface tags (ring_suspected, ring_id, severity) and numeric risk_score to intake rules within 1 second of scoring And in the reviewer console, display the ring card listing linked entities, claims, channels, stores, and last-30-day activity And for Critical severity, block new duplicate submissions for the same serial for 30 days unless the submission includes matching order_id and proof_of_purchase, in which case suggest merge instead of blocking

Explainable ring evidence package

Given a claim or cluster is flagged When a reviewer opens the case Then show at least 3 top contributing features with values and per-feature impact percentages And include a human-readable rationale sentence referencing serial reuse counts, velocity windows, and geo distance where applicable And provide downloadable JSON evidence including feature list, timestamps, linked entities, and data sources

Reviewer feedback ingestion and model adaptation

Given a reviewer sets an outcome on a flagged item (Confirmed Fraud, Legitimate, or Inconclusive) When the outcome is submitted Then persist the label with reviewer_id, timestamp, scope (claim/entity/cluster), and ring_id And queue a re-score of the affected cluster within 10 minutes And update adaptive weights or threshold profile version and record a change log entry with before/after values And expose per-tenant metrics endpoints reporting last-30-day precision@High, false_positive_rate@High, and label coverage

Audit trail, versioning, and reproducibility

Given any risk computation or threshold change occurs When an auditor requests historical state for a claim at a past timestamp Then return the risk score, severity, model_version, threshold_profile_id, and feature vector used at that time And ensure re-running the scorer with the same model_version and inputs reproduces the score within ±1 point And retain scoring and decision logs for at least 400 days And provide an export of ring activity (creates, updates, blocks, merges) with immutable event IDs

Merge Suggestion Engine

"As a case manager, I want merge suggestions for legitimate duplicates so that I can consolidate cases and keep histories accurate with minimal effort."

Description

Suggest merges for legitimately duplicated claims by generating candidates using serial fingerprint, order/receipt identifiers, customer identity signals, and time proximity. Assign confidence scores and reason codes (e.g., channel resubmission, reopened case) and offer one-click, idempotent merges that preserve full histories and attachments. Prevent cross-customer merges unless confidence exceeds policy thresholds. Provide undo capability and an audit trail of pre/post-merge states.

Acceptance Criteria

High-Confidence Candidate from Serial and Order Match Across Channels

Given claim A and claim B have identical serial fingerprint, identical order/receipt ID, and are created within 48 hours on different intake channels When the Merge Suggestion Engine runs Then claim B appears as the top merge candidate for claim A with confidence >= 0.90 and reason_code = channel_resubmission And the candidate list is sorted by descending confidence And no candidate with confidence < 0.30 is returned

Cross-Customer Merge Policy Threshold Enforcement

Given claim A (customer_id = X) and claim B (customer_id = Y, Y != X) produce a calculated confidence of 0.94 And the cross_customer_merge_threshold policy is set to 0.98 When the Merge Suggestion Engine evaluates candidates Then claim B is not suggested as a merge candidate for claim A And a direct merge attempt via API is rejected with status = 403 and error_code = cross_customer_policy_violation And an audit event records the blocked attempt including threshold and calculated confidence

One-Click Idempotent Merge

Given a suggested candidate between claim A and claim B exists When an agent clicks Merge once or retries within 5 minutes using the same idempotency_key Then exactly one merged claim is created with a stable merged_claim_id And subsequent retries return status = 200 and operation = no_op And both original claim IDs are marked superseded and redirect to merged_claim_id

History and Attachment Preservation on Merge

Given claim A and claim B are merged Then the merged claim contains the union of all timeline events from A and B with original authors and timestamps preserved And all attachments from A and B are present, de-duplicated by content hash, with original filenames and upload timestamps preserved And for conflicting single-value fields, the value from the earliest-created claim is retained and a conflict entry is added to merged_claim.metadata.conflicts with both originals

Undo Merge Restores Originals

Given claim A and claim B were merged into claim M less than 30 days ago and claim M is not Closed When an agent clicks Undo Merge on M Then claim A and claim B are restored with their original IDs, timelines, attachments, and field values And claim M is closed with status = merge_reverted and contains links to A and B And the operation is idempotent; repeating Undo returns status = 200 and operation = no_op

Audit Trail Captures Pre/Post States for Merge

Given any merge operation completes Then an immutable audit event is written containing before_state for both source claims and after_state for the merged claim And the event includes initiator_id, initiator_channel, timestamp, confidence, reason_codes[], policy_overrides[], and idempotency_key And the audit record is retrievable via Audit API and UI by admin roles and shows a computed diff of key fields

Reason Codes and Confidence Attribution in Suggestions

Given a suggested merge candidate is generated Then the candidate includes reason_codes[] from the approved taxonomy (e.g., channel_resubmission, reopened_case, same_customer_duplicate, time_proximity_duplicate) and a human_readable_explanation referencing the matched signals And the confidence score is a float between 0 and 1 with two-decimal precision, and feature_attributions collectively account for at least 90% of the score And selecting Why suggested? reveals the explanation and attributions with server processing time <= 500 ms

Reviewer Console & Auditability

"As a QA reviewer, I want a console that shows evidence, lets me override or confirm flags, and records my actions so that decisions are transparent and auditable."

Description

Embed a Serial Graph panel in the ClaimKit case view that displays the serial timeline, related claims, map of locations, velocity metrics, and ring membership indicators. Enable actions to confirm/override flags, approve or block submissions, merge candidates, add notes, and apply tags, with role-based access controls. Record a complete audit trail including evidence snapshots, user, timestamp, and rationale, and support export for compliance. Ensure the console updates in real time as new graph edges or detector results arrive and does not interfere with SLA timers for non-actionable items.

Acceptance Criteria

Serial Graph Panel Rendering in Case View

Given a case with a valid serial, When the case view loads, Then the Serial Graph panel renders within 2 seconds and displays: serial timeline, related claims count with links, a map with the last 10 geolocations, velocity metrics for 7/30/90 days, and ring membership indicators when applicable. Given no related data for any widget, When the panel loads, Then each widget shows an explicit "No data" state without console errors or failed network requests. Given feature flags are enabled for Serial Graph, When a permitted user opens a case, Then the panel is visible; Given the feature flag is disabled, Then the panel does not render and no network calls are made to graph services.

Role-Based Access Controls for Reviewer Actions

Given a user with role Reviewer or Admin, When viewing the panel, Then action controls for Confirm Flag, Override Flag, Approve, Block, Merge, Add Note, and Apply Tag are enabled. Given a user with role Viewer, When viewing the panel, Then all action controls are hidden or disabled and tooltips indicate insufficient permissions. Given an unauthorized API request to perform any reviewer action, When executed, Then the server returns HTTP 403 and no state change is persisted.

Flag Handling and Submission Decisions

Given a claim auto-flagged by the Serial Graph, When a Reviewer clicks Confirm Flag and enters a rationale of at least 10 characters, Then the claim flag status updates to Confirmed, and an audit record is written with user, timestamp, rationale, and detector snapshot. Given a claim auto-flagged by the Serial Graph, When a Reviewer clicks Override Flag and enters a rationale of at least 10 characters, Then the flag is cleared, any downstream blocks are lifted, and an audit record is written. Given a pending submission, When a Reviewer selects Approve, Then the submission state changes to Approved within 1 second, SLA timers continue or start as configured, and an audit record is written. Given a pending submission, When a Reviewer selects Block and enters a rationale of at least 10 characters, Then the submission state changes to Blocked, SLA timers are not started or are canceled for that submission, and an audit record is written. Given any of these actions attempted without required rationale, Then the operation is rejected with a validation error and no state change occurs.

Duplicate Blocking and Merge Suggestions

Given a new submission shares a serial with an open claim and matches duplicate rules, When it is created, Then the system blocks it with HTTP 409 Duplicate and the UI shows the reason plus links to the related open claim(s). Given two claims are suggested as legitimate duplicates by the Serial Graph, When a Reviewer confirms the merge, Then the claims merge within 2 seconds, preserving all evidence, notes, and tags, and the resulting claim retains the earliest SLA start time. Given a suggested merge is canceled by the Reviewer, Then no changes are made to either claim.

Notes and Tags with Immediate Visibility

Given a Reviewer adds a note and applies one or more tags, When saved, Then the note and tags appear in the case activity feed within 1 second and the case becomes discoverable via tag filters. Given note or tag updates, When persisted, Then the audit log records the user, ISO 8601 UTC timestamp, and before/after values. Given input exceeding 5,000 characters or containing disallowed HTML, When submitted, Then the system sanitizes and truncates per policy and records the sanitized content; the UI warns the user of truncation.

Comprehensive Audit Trail and Export

Given any reviewer action (confirm/override flag, approve/block, merge, note/tag), When completed, Then an immutable audit entry is stored with: action type, user ID and role, ISO 8601 UTC timestamp, rationale text, before/after diffs, and evidence snapshots (timeline state, map image, detector scores). Given an Admin requests an audit export for a case, When triggered, Then a downloadable ZIP containing CSV and JSON audit entries plus PNG evidence snapshots is generated within 30 seconds, with a SHA-256 checksum provided. Given an attempt to modify an existing audit record, When executed, Then the system rejects the change and logs a security event without altering the original record.

Real-Time Updates and SLA Non-Interference

Given new graph edges or detector results arrive while a case view is open, When received, Then the Serial Graph panel updates within 5 seconds via real-time transport without a full page reload and highlights changed fields for at least 3 seconds. Given a non-actionable update (e.g., increase in related claims without new flags), When processed, Then SLA timers for the current case do not pause, reset, or otherwise change. Given a duplicate submission is auto-blocked, When block is applied, Then SLA timers are not started for that submission and the blocked state is reflected in the related case links. Given a temporary loss of connectivity, When real-time updates fail, Then the UI shows a "Reconnecting" state and a Last Updated timestamp; upon reconnection, missed updates are fetched and applied within 10 seconds.

Fraud Score

A real-time, explainable risk score per claim that blends serial validity, receipt integrity, seller reputation, device/IP patterns, purchase-channel signals, and date plausibility. Thresholds drive auto-approve/deny or route-to-review actions. Benefit: consistent, scalable decisions that cut handling time and reduce bias while keeping good customers moving.

Requirements

Real-time Fraud Scoring Service

"As an operations leader, I want each claim scored in real time so that high-risk cases are intercepted while legitimate customers move through without delays."

Description

A stateless, horizontally scalable service that computes a fraud risk score from 0–100 for each claim in under 200ms at creation and on significant updates. It blends serial validity, receipt integrity, seller reputation, device/IP patterns, purchase-channel signals, and date plausibility using a weighted model with versioning. The service exposes synchronous API and event-driven interfaces, returns score, confidence, model version, and latency, and writes results to the case record. It must be resilient to partial feature availability, applying graceful degradation and retries without blocking case creation. Rate limits and tenancy isolation ensure consistent performance across brands.

Acceptance Criteria

Real-Time Scoring on Claim Creation

Given a valid claim payload with tenant_id and claim_id When POST /fraud-score is invoked during claim creation Then the service responds with HTTP 200 and completes within 200ms P99 over 10k requests And the response body contains score (0–100 integer), confidence (0.0–1.0), model_version (semver), latency_ms, correlation_id And the score and metadata are written to the case record within 50ms of response And latency_ms reflects end-to-end processing time within ±5ms of observed trace timing

Re-Scoring on Significant Claim Update

Given a claim with an existing fraud score and an update that changes at least one significant field (serial_number, receipt_pdf, purchase_date, seller_id, ip_address, billing_address) When the update is received via API or event Then the service recomputes the score and responds/emits within 200ms P99 And the new score, confidence, model_version, latency_ms are persisted on the case And the previous score is retained in an immutable audit trail with timestamp and source of change And a fraud_scored event is published containing claim_id, previous_score, new_score, confidence, model_version, latency_ms

Graceful Degradation with Partial Feature Outage

Given one or more feature providers (e.g., serial_validation, seller_reputation, device_ip) are unavailable or exceed a 100ms per-call timeout When a scoring request is processed Then the service completes without blocking claim creation and returns within the 200ms overall budget P99 And missing features are replaced with configured default priors And the response flags degraded=true and lists degraded_features with reason codes (timeout, error, unavailable) And confidence is adjusted according to configured weights for missing features And each missing feature is retried up to 2 times with exponential backoff without exceeding the 200ms budget

Per-Tenant Rate Limiting and Isolation

Given tenant A has a configured limit of 100 RPS and tenant B has 20 RPS When concurrent scoring requests from tenant A spike to 200 RPS while tenant B sustains 5 RPS Then over-limit requests for tenant A receive HTTP 429 with a Retry-After header and are not queued beyond 50ms And tenant B maintains P99 latency ≤200ms and zero 429s during tenant A's spike And no response or logs for any tenant contain identifiers or data from another tenant And resource utilization is partitioned so that tenant A cannot degrade tenant B's throughput below configured limits

Synchronous API Contract and Validation

Given a request with missing or invalid fields (e.g., tenant_id missing, claim_id invalid format, purchase_date in the future beyond allowed skew) When POST /fraud-score is called Then the service returns HTTP 400 with machine-readable error codes per invalid field and no score is computed And for a valid request the response schema strictly matches the contract: {score:int 0–100, confidence:float 0.0–1.0, model_version:string semver, latency_ms:int, correlation_id:string} And content-type is application/json; charset=utf-8 and responses include Cache-Control: no-store And P95 response size is ≤2KB

Event-Driven Interface: ClaimCreated to FraudScored

Given a valid ClaimCreated event is received on the bus with tenant_id and claim_id When the event is consumed Then the service computes the fraud score and publishes a FraudScored event within 200ms P99 of receipt And the FraudScored event includes claim_id, tenant_id, score, confidence, model_version, latency_ms, correlation_id And message delivery is at-least-once with idempotency ensured via claim_id + event_id deduplication so downstream receives exactly one effective update And failures in publishing are retried with exponential backoff up to a 2s ceiling without duplicating persisted case data

Statelessness and Horizontal Scalability

Given the service runs N stateless instances behind a load balancer without sticky sessions When traffic ramps from 50 RPS to 1,000 RPS over 5 minutes Then the service scales horizontally to maintain P99 latency ≤200ms and error rate <0.5% And no instance stores request or session state beyond the request lifecycle; restarts do not affect correctness And adding or removing instances does not interrupt in-flight requests nor change scores for identical inputs And CPU utilization per instance remains ≤70% at steady 1,000 RPS with headroom for spikes

Signal Ingestion and Feature Store

"As a data engineer, I want reliable, consistent features for each claim so that the scoring service can operate accurately and at low latency."

Description

A managed pipeline and feature store that consolidates fraud-relevant signals from the magic inbox, email/PDF parsers, serial databases, seller reputation feeds, device/IP enrichment, and order systems. It ensures idempotent updates keyed by claim and customer, enforces schema and data quality checks, and computes normalized features with time windows. Backfills historical claims for modeling and sandbox simulations and exposes low-latency online features and batch exports. Supports PII minimization with tokenization and per-tenant data partitioning.

Acceptance Criteria

Multi-Source Signal Ingestion and Schema Enforcement

Given configured sources (magic inbox, email/PDF parsers, serial databases, seller reputation feeds, device/IP enrichment, order systems) When the streaming/batch ingestion pipeline is running Then 99% of valid records are landed in the raw zone within 5 minutes of source availability per source And all records conform to the registered versioned schema for their source and version And records failing schema or mandatory field validation are rejected to a quarantine with machine-readable error codes and source lineage IDs And a source-to-claim/customer correlation ID is attached to every accepted record

Idempotent Updates by Claim and Customer

Given duplicate or replayed events for the same claim_id and customer_id across any source When the pipeline processes these events Then the feature store holds exactly one current record per {tenant_id, claim_id} and per {tenant_id, customer_id} And upserts are idempotent across retries and backfills using a deterministic idempotency key And last-write-wins is determined by event_time then ingestion_time And the deduplication decision and idempotency key are persisted to the audit log

Data Quality Validation and Alerting

Given incoming payloads from any source When data quality rules execute Then required fields (as per schema registry) have a 0% null rate or the record is quarantined with reason codes And numeric, date, and enum fields pass range and domain checks defined per field And referential checks (e.g., serial validity against serial DB) pass or the record is flagged with dq_rule_id And if more than 1% of records per source fail DQ within a 15-minute window, an alert is sent to the on-call channel and the source is auto-throttled And DQ metrics (passed, failed, quarantined) and sample error records are published to the monitoring dashboard each run

Time-Windowed Feature Computation and Normalization

Given validated raw signals When the feature computation job runs Then features defined in the registry are computed for 1h, 24h, and 30d windows (or as configured) with window boundaries based on event_time And normalization parameters (e.g., mean/std or min/max) are loaded from the model registry and applied consistently And if normalization parameters are missing or stale, the job fails safely and emits an alert without publishing partial features And offline vs online feature values match within max(1e-6, 0.5%) for a 10k record sample per tenant daily And each published feature is tagged with feature_version, window, and event_time watermark

Historical Backfill for Modeling and Sandbox

Given a requested date range and tenant scope When a backfill is initiated Then 100% of eligible claims in range are processed with row-count reconciliation within ±0.1% against source systems per day And throughput is at least 1,000,000 events per hour without impacting online read SLA And the backfill is idempotent and restartable from checkpoints without duplicating outputs And outputs are written to a versioned sandbox snapshot with manifest and data dictionary And completion, duration, and discrepancy metrics are recorded and surfaced in the backfill report

Low-Latency Online Serving and Scheduled Batch Exports

Given features have been computed and upserted When online reads occur via the feature API Then p95 read latency is ≤150 ms and p99 ≤300 ms over a rolling 24h window with availability ≥99.9% And propagation from accepted raw signal to online feature availability is ≤60 seconds p95 And daily batch exports are delivered by 02:00 UTC to tenant-specific destinations with schema matching the registry and an event_time watermark And exports include completeness checksums and are retriable without duplication

PII Minimization and Per-Tenant Data Partitioning

Given PII fields are present in incoming signals When data is stored in the feature store Then PII is tokenized or hashed using approved algorithms; no plaintext PII persists in feature storage And mappings of tokens to raw PII (if any) are stored only in a separate secure vault with access restricted by role and tenant And all tables/buckets are physically and logically partitioned by tenant_id, and cross-tenant queries are denied by the authorization layer And periodic scans detect 0 occurrences of plaintext PII in feature storage, with violations blocking the pipeline and alerting security And right-to-erasure requests remove or re-tokenize affected records within 24 hours end-to-end

Threshold Policies and Auto-Routing

"As a support manager, I want customizable thresholds that drive automatic decisions so that we scale consistent outcomes and reduce handling time."

Description

Configurable risk thresholds that map scores to actions: auto-approve, route-to-manual-review, or auto-deny with reason codes. Policies support per-tenant settings, channel overrides, and SLA-aware routing that starts or pauses timers accordingly. Includes a safe “monitor-only” mode to simulate policy impacts before enforcement and a kill switch to disable automation if anomalies are detected. Integrates with queues, notifications, and case state transitions without requiring agent intervention.

Acceptance Criteria

Auto-Approve Threshold Routes Eligible Claims

Given tenant T has a threshold policy that sets action Auto-Approve for scores >= 80 When a claim C for tenant T with Fraud Score 85 is evaluated Then C is transitioned to state "Approved (Automated)" And C is not placed in any manual review queue And the fulfillment SLA timer starts within 2 seconds of the decision And an "Auto-Approved" notification is dispatched to the configured channel And the decision is persisted with policy_version, score, evaluation_timestamp, and actor "system" And end-to-end decision latency is <= 500 ms at p95 under normal load

Manual-Review Threshold Routes to Channel-Specific Queue

Given tenant T has a global policy that sets action Manual Review for scores between 50 and 79 And a channel override exists for channel "Marketplace" mapping Manual Review to queue "Fraud Review - Marketplace" When a Marketplace claim with Fraud Score 72 is evaluated Then the claim state is set to "Pending Review (Automated)" And the claim is enqueued exactly once to "Fraud Review - Marketplace" (idempotent on re-evaluation) And the Review SLA timer starts within 2 seconds And the fulfillment SLA timer is paused or not started And a review notification is sent to the review team And decision latency is <= 700 ms at p95 under normal load

Auto-Deny Threshold Applies Reason Codes and Pauses SLA

Given tenant T has a threshold policy that sets action Auto-Deny for scores < 50 When a claim with Fraud Score 35 and contributing signals [SERIAL_INVALID, RECEIPT_TAMPERED] is evaluated Then the claim state is set to "Denied (Automated)" And the denial reason_codes include SERIAL_INVALID and RECEIPT_TAMPERED And no manual review queue is assigned And no review or fulfillment SLA timers are started (and any running fulfillment timers are paused) And a denial notification using template "AutoDeny" is queued to the customer communication channel And the decision is persisted with policy_version, score, reasons, and evaluation_timestamp And decision latency is <= 500 ms at p95 under normal load

Per-Tenant Policies Apply Without Cross-Tenant Leakage

Given tenant A policy maps scores 70–100 to Auto-Approve and tenant B policy maps scores 70–100 to Manual Review When two claims with Fraud Score 70 are evaluated, one from tenant A and one from tenant B Then the tenant A claim is transitioned to "Approved (Automated)" with fulfillment SLA started And the tenant B claim is transitioned to "Pending Review (Automated)" and routed to the configured review queue with review SLA started And each evaluation reads only its tenant’s policy and configuration And audit logs for both decisions include tenant_id, policy_version, and action And no cross-tenant configuration read or write occurs (verified via logs and configuration access checks)

Monitor-Only Mode Simulates Decisions Without Enforcement

Given tenant T has policy mode set to MonitorOnly When any claim is evaluated Then the system computes the would-be action (Auto-Approve, Manual Review, or Auto-Deny) but does not change case state, queues, or SLA timers And the simulated_action, score, policy_version, and override sources are recorded in the audit log and visible in the UI And no notifications are sent And a daily report of simulated impacts (counts per action, SLA deltas) can be exported via API And toggling MonitorOnly off causes subsequent evaluations to enforce actions without retroactively changing prior claims

Kill Switch Disables Automation and Falls Back Safely

Given the automation kill switch is enabled for tenant T When new claims arrive or existing claims are re-evaluated Then fraud policy evaluation and automated actions are bypassed And claims remain in or enter the default state "Open" without automated queue routing And no SLA timers are started or paused by fraud policy logic And an admin alert is created indicating automation is disabled And audit entries record the kill_switch state for each bypassed evaluation And when the kill switch is disabled, normal automation resumes for subsequent evaluations without reprocessing past claims automatically

Audit Log Captures Full Decision Context and Is Queryable

Given any policy evaluation (enforced or monitor-only) completes When the decision record is retrieved via API or UI Then it contains claim_id, tenant_id, policy_version, evaluation_timestamp, score, evaluated_thresholds, final_action or simulated_action, override_sources (e.g., channel), reason_codes (if any), SLA_timer_changes, queue_target (if any), notification_ids (if any), evaluation_latency_ms, and an idempotency_key And the record is immutable and tamper-evident And it is available within 2 seconds of the decision and retained for at least 30 days And queries can filter by tenant_id, action, channel, and date range and return results within 2 seconds for up to 10k records

Explainability and Reason Codes

"As a support agent, I want clear reasons behind a fraud score so that I can confidently communicate decisions and resolve disputes faster."

Description

Transparent explanations that accompany each score, highlighting top contributing signals, their directions, and magnitudes, with human-readable reason codes suitable for agent review and customer communication. Provides a concise “why” summary in the case view, a structured payload in the API, and links to underlying signal values for auditability. Supports localization, redaction of sensitive inputs, and a stable taxonomy of reason codes for reporting and appeals.

Acceptance Criteria

Case View Why Summary and Top Contributors

Given a claim with a computed fraud score is opened in the case view When the agent loads the case Then a Why summary is visible adjacent to the score and SLA timer Then the summary lists the top 3-5 contributing signals sorted by absolute impact magnitude with direction indicators (risk_up|risk_down) Then each listed contributor displays: human-readable title, impact magnitude in score points (one decimal), and a tooltip with the raw and normalized signal values Then the listed contributors together account for at least 80% of total explanation magnitude or include a link to view all contributors Then the Why block renders within 200 ms P95 after the case view data is loaded

API Explanation Payload and Schema Guarantees

Given an authorized client requests the explanation via GET /claims/{id} with include=fraud_explanation or GET /claims/{id}/fraud-explanation When the claim has a computed score Then the response contains fraud_explanation with fields: score, model_version, generated_at, reason_codes[], top_contributors[], signals[] Then each reason_codes[] item includes: code (stable, UPPER_SNAKE_CASE), title, customer_safe_title, description, customer_safe_description, category, severity, appealable, locale_keys Then each top_contributors[] item includes: code, magnitude (float), direction (risk_up|risk_down), signal_value, normalized_value, source_link, snapshot_id Then the response adheres to the published OpenAPI schema and validates with no warnings Then P95 latency for the explanation endpoint is <= 300 ms at 50 RPS in staging

Auditability Links to Underlying Signal Values

Given an agent clicks View details on a listed contributor in the case view When the modal opens Then it shows data source, captured timestamp, and a preview or pointer to the underlying artifact (receipt snippet, serial check result, IP/device report) Then Open source_link opens a pre-filtered log or artifact viewer scoped to the claim_id and signal_id Then access control enforces that only users with role Audit can view unredacted artifacts; others see redacted values with a lock icon Then every artifact link includes an immutable snapshot_id ensuring the same content is retrieved over time Then each open action is recorded in the audit log with user_id, claim_id, signal_code, timestamp, and outcome

Localization for Agent and Customer-Facing Explanations

Given the agent changes language to es-ES in the UI When viewing the Why summary Then all titles and descriptions render in Spanish; missing keys fall back to en-US Then variable placeholders render with locale-specific formatting (dates, numbers, currency) and pluralization rules Then each reason code provides both agent_title/description and customer_safe_title/description; customer-safe strings exclude internal jargon and sensitive fields Then outbound customer communications use only customer-safe strings matched to the customer's locale Then localization coverage across supported locales is >= 95% as measured by l10n completeness reports

Redaction and Sensitive Data Handling

Given a contributor references PII (email, phone, address, payment token) When explanations are displayed in UI or returned via API Then PII is masked per policy: email local part partially masked, phone shows last 2 digits, addresses limited to city and region, payment tokens last 4 only Then users with scope pii:read or role Audit may view unredacted values via an explicit reveal action; all reveals are audited Then exported logs, webhooks, and data warehouse syncs receive only redacted values unless pii:read export is explicitly configured Then automated tests assert that no unmasked PII appears in a random sample of 500 explanations (0 defects threshold)

Stable Reason Code Taxonomy and Versioning

Given taxonomy version v1.x is active When new reason codes are added Then new codes are unique, never reuse retired codes, and include version_added metadata Then changes to titles or descriptions are allowed without altering code semantics; changes to semantics require a new code and deprecating the old code with deprecation_date and alias_of Then existing API consumers continue to function without changes; a changelog is published and linked in the payload as taxonomy_changelog_url Then reporting can aggregate by code and by category across versions; deprecated codes remain in reports until end_of_support

Deterministic and Consistent Explanations

Given the same claim_id and model_version When the explanation is requested 10 times over 1 hour Then top_contributors order and magnitudes are identical within +/- 0.1 points across all requests Then rounding policy is consistent (one decimal place) between UI and API Then explanation payload includes model_version and explanation_method; changes in either are logged and surfaced via a non-blocking UI badge

Admin Configuration and Sandbox

"As a risk analyst, I want a safe place to tune policies and test changes so that I can improve catch rates without disrupting legitimate customers."

Description

A secure UI where authorized users manage thresholds, weight overrides within guardrails, allow/deny lists (sellers, IPs, serial ranges), and channel-specific policies. Includes a sandbox to run simulations against historical claims, compare score distributions across model versions, and preview decision impacts before publishing changes. Provides role-based access control, change previews, and a versioned, auditable publish workflow with rollback.

Acceptance Criteria

Role-Based Access Control for Fraud Config Admin

Given an authenticated user with Admin role, When they access Admin > Fraud Config, Then they can view, edit, run simulations, and submit for publish. Given an authenticated user with Editor role, When they access Admin > Fraud Config, Then they can view, edit, and run simulations but cannot publish. Given an authenticated user with Viewer role, When they access Admin > Fraud Config, Then they can view and export previews but cannot edit, simulate, or publish. Given a user without Admin/Editor/Viewer roles, When they attempt to access via UI or API, Then they receive 403 and no configuration data is returned. Given any permission denial, When it occurs, Then the event is logged with user ID, role, endpoint, timestamp, and reason. Given SSO/IdP role mapping, When a role is updated in the IdP, Then access rights take effect on next login without manual sync.

Thresholds and Weight Overrides with Guardrails and Live Preview

Given predefined guardrails per parameter, When a value outside the allowed range is entered, Then the UI blocks save and shows inline validation with the permitted range. Given valid values within guardrails, When the admin saves as Draft, Then the configuration version increments and is not active until published. Given a Draft, When the admin runs Preview on a 90-day sample, Then the system computes scores and decision outcomes and displays counts by action (auto-approve, review, deny). Given routing thresholds change action boundaries, When Preview completes, Then the UI highlights impacted ranges and shows the delta versus current production. Given a Draft with changes, When the admin attempts to submit without a Preview run in the last 24 hours, Then submission is blocked and a prompt to run Preview is shown. Given numeric weight overrides, When normalization is required, Then the system applies the configured normalization policy and displays effective weights used in scoring.

Allow/Deny Lists Management (Sellers, IPs, Serials)

Given the Allow/Deny Lists screen, When an admin creates an entry for a seller domain, IP/CIDR, or serial/serial range, Then the entry is validated for format, uniqueness, and conflicts before saving. Given bulk import, When a CSV template with up to 10,000 rows is uploaded, Then the system validates, reports row-level errors, and performs an all-or-nothing transactional import on confirmation. Given overlapping serial ranges or duplicate IPs, When detected, Then the UI prompts to merge or prioritize and prevents ambiguous entries from being saved. Given an entry with effective start/end dates and reason, When saved, Then the change is recorded in the audit log with actor, timestamp, and justification. Given an entry toggle to Active or Inactive, When deactivated and later published, Then it stops affecting scoring and can be reactivated in a future version. Given a Preview run with modified lists, When executed, Then impacted historical claims and open case counts are displayed prior to submission.

Channel-Specific Policy Configuration and Conflict Resolution

Given a defined set of purchase channels (Web, Marketplace, In-Store, Phone), When creating a policy, Then the admin can scope thresholds and rules to one or more channels. Given overlapping global and channel-specific rules, When both apply, Then precedence is deterministic: channel-specific overrides global; ties are blocked with an error until resolved. Given a claim with channel metadata, When evaluated, Then the policy engine applies the correct channel scope and logs the evaluated rule path for the claim. Given a policy marked Inactive for a channel, When published, Then claims from that channel fall back to global rules. Given a channel-scoped Preview, When run, Then the UI shows per-channel score distributions and action rates plus a total aggregate.

Sandbox Simulation on Historical Claims with Safe Isolation

Given selection of a date range, filters, and a model/config version, When the admin runs a simulation, Then results are computed without writing to production entities and are labeled Simulation Only. Given a dataset up to 50,000 claims, When the simulation is executed, Then summary metrics and distributions are returned within 5 minutes at the 95th percentile. Given a simulation run ID, When revisited, Then the results are reproducible and downloadable as CSV with per-claim scores and actions. Given a simulation, When filters (channel, seller, device type) are applied, Then all charts and counts update consistently and totals reconcile. Given an in-progress simulation, When the user navigates away, Then the job continues and completion is notified via in-app notification and email.

Model Version Comparison and Score Distribution Analysis

Given two selected model versions and the same dataset, When Compare is executed, Then the UI displays overlaid score histograms and a tabular delta of key metrics (mean, median, standard deviation, KS statistic). Given a set of decision thresholds, When applied to both versions, Then the comparison shows changes in auto-approve, review, and deny rates and the delta for each threshold. Given labels available on historical claims, When provided, Then AUC/ROC and confusion matrices are computed per version; otherwise these metrics are hidden. Given the comparison view, When exported, Then a CSV or PDF summary with parameters, metrics, and timestamp is generated and matches the on-screen data.

Versioned Publish Workflow with Approval, Audit, and Rollback

Given a Draft with changes, When submitted for publish, Then a diff versus current production is shown summarizing changes to thresholds, weights, lists, and policies. Given approval rules requiring two distinct approvers excluding the author, When approvals are collected, Then Publish becomes enabled; otherwise Publish remains disabled. Given Publish, When executed by a user with Publisher role, Then the new configuration becomes active within 2 minutes and all subsequent scoring uses it. Given a published version, When Rollback is invoked, Then the system reverts to the selected prior version, records a rollback event, and notifies configured subscribers. Given any publish or rollback, When complete, Then the audit log captures who, what, when, why, and the exact diff; older versions remain read-only and retrievable.

Monitoring, Drift Detection, and Audit Logging

"As a compliance officer, I want full audit trails and drift alerts so that we can prove fairness, investigate disputes, and maintain model performance over time."

Description

End-to-end observability with dashboards for score distributions, decision rates, false positive/negative proxies, latency, and throughput, plus alerts on anomalies. Implements feature and data drift detection with thresholds that trigger notifications and optional auto-reversion to a stable model. Every scored decision is immutably logged with inputs, score, model/policy versions, and actor, and is exportable to BI and compliance systems with retention controls.

Acceptance Criteria

Score Distribution and Decision Dashboard

Given the Fraud Score service is running and claims are being scored When at least 100 claims are processed in the last 15 minutes Then the dashboard displays score distribution (0–100) as histogram and quantiles with data freshness under 60 seconds And the dashboard displays decision rates (auto-approve, auto-deny, route-to-review) by channel, seller, and tenant And users can filter by date range, channel, seller, policy version, and tenant, with results returned in under 3 seconds p95 And 13 months of history is retained and visible

FP/FN Proxy Monitoring and Alerts

Given auto-decision outcomes and subsequent human reviews are recorded When a claim is overturned from auto-deny to approve, mark it as an FP-proxy event And when an auto-approved claim is later flagged fraudulent within 30 days, mark it as an FN-proxy event Then the dashboard shows daily and 7-day rolling FP-proxy and FN-proxy rates by channel and seller And an alert is sent to Slack and Email if FP-proxy > 2% or FN-proxy > 0.5% for 2 consecutive hours, delivered within 2 minutes of detection And the alert payload includes the window, rates, counts, top 5 contributing features by importance, and a link to the dashboard

Latency and Throughput Observability with SLO Breach Alerts

Given scoring requests are received via API When measuring end-to-end time from request receipt to decision emitted Then the dashboard shows p50, p95, and p99 latency and requests-per-second globally and per tenant And trace samples link latency to model inference and IO spans for root-cause analysis And an alert fires if p95 latency exceeds 300 ms for 5 of 10 consecutive minutes or throughput drops by >30% from 7-day baseline, delivered within 2 minutes And a missing-data alert fires if no metrics are received for 60 seconds while traffic is nonzero

Feature and Data Drift Detection with Auto-Reversion

Given a reference window of the last 30 days on the active model and a current window of the last 24 hours When Population Stability Index for any of the top 20 features exceeds 0.2 or KL divergence of score distribution exceeds 0.1 Then drift status is set to Alert and a notification is sent with impacted features, magnitudes, and sample slices And if drift persists for 30 minutes and auto-reversion is enabled, the system reverts to the last stable model/policy version and records a change event And a cooldown of 2 hours prevents repeated reversions unless manually overridden by an admin And all drift calculations and actions are logged with versioned code artifact hashes

Immutable Audit Logging per Scored Decision

Given any claim is scored or its decision is changed When the event is written to the audit log Then the record includes timestamp (UTC), claim ID, tenant, score, decision, thresholds, model version, policy version, actor (system or user ID), request ID, source channel, device/IP fingerprint, and a SHA-256 hash of normalized inputs And records are stored in WORM mode for the retention period and linked via hash chaining to be tamper-evident And logs are queryable by claim ID, request ID, and time range with p95 query latency under 2 seconds for up to 10k records And updates are append-only; corrections create a new record referencing the prior record; deletions during retention are blocked

Export to BI/Compliance with Retention Controls

Given export destinations are configured per tenant for S3 and Snowflake When streaming (near-real-time under 2 minutes) and daily batch exports are enabled Then all audit events and observability metrics are exported using documented schemas with schema version tags And PII masking rules, if configured, are applied to exports and logged And export failures are retried 3 times with exponential backoff; after 30 minutes cumulative failure an alert is sent And retention is configurable per tenant (e.g., 13 or 36 months); records past retention are purged within 24 hours and purge summaries are logged and exportable

Step-Up Proof

Dynamic verification that requests just-enough extra evidence (e.g., serial-plate photo with timestamp, packaging label, or redacted bank proof) when risk is moderate. Prompts are auto-generated and tracked inside the case. Benefit: rescues borderline legitimate claims, reduces back-and-forth for agents, and stops bad claims without heavy friction for everyone.

Requirements

Adaptive Risk Scoring & Triggering

"As an operations lead, I want ClaimKit to automatically decide when extra proof is needed so that agents only step up borderline claims and legitimate customers experience minimal friction."

Description

Compute a real-time risk score for each incoming claim using receipt/serial extraction confidence, purchase eligibility checks, customer history, claim velocity, product category risk, channel, and anomaly heuristics. When risk is in a configurable “moderate” band, automatically trigger step-up verification and map the risk segment to a minimal evidence set (e.g., timestamped serial-plate photo, packaging label, or redacted bank proof). Provide admin policies with thresholds, exceptions, and brand-level overrides; include rule preview/backtest to estimate impact before publishing. Integrate with ClaimKit’s magic inbox so auto-created cases are evaluated without agent intervention and every trigger is explainable via logged factors. Expected outcome: fewer false declines, reduced agent back-and-forth, and lower fraud without broad friction.

Acceptance Criteria

Real-Time Risk Score Computation at Case Ingestion

Given an incoming claim with extracted receipt/serial signals, eligibility result, customer history, claim velocity, product category risk, channel, and anomaly heuristics When the claim is created or updated in ClaimKit Then a normalized risk score (0–100, one-decimal) is computed deterministically for the same inputs and policy version And the score is stored on the case with policy version, timestamp, and correlation ID And factor contributions (name, value, weight, contribution) are recorded for every factor used or marked as missing with default handling And p95 scoring latency is ≤300 ms and p50 ≤50 ms at 100 RPS in staging benchmarks And unit tests verify scoring consistency within ±0.0 (exact) across 10 identical re-evaluations

Configurable Risk Bands, Thresholds, and Brand-Level Overrides

Given an admin defines risk bands with moderate_low and moderate_high When the policy is saved and published Then the moderate band is applied as moderate_low ≤ score < moderate_high And global thresholds are versioned and auditable, with effective-from timestamps And brand-, channel-, or SKU-level overrides supersede global thresholds where defined And exceptions can disable step-up for specified segments and are honored at evaluation time And changes do not affect live scoring until the policy is explicitly published And publishing emits an audit event with editor, diff, and preview summary

Automatic Step-Up Triggering and Minimal Evidence Mapping

Given a case evaluates to a risk score in the configured moderate band When the evaluation completes Then one and only one open Step-Up task is auto-created without agent action And the task selects the minimal evidence set mapped to the risk reason: low OCR/serial confidence → timestamped serial-plate photo; eligibility mismatch → redacted bank proof; shipping anomaly → packaging/label photo And the customer-facing prompt and upload links are generated and attached to the case And an SLA timer specific to Step-Up is started and visible on the case And if the case exits the moderate band, the Step-Up task is auto-cancelled (to low) or routed to manual review (to high) with reason logged And duplicate triggers within 10 minutes are suppressed

Magic Inbox Auto-Created Cases Are Scored and Acted On

Given an email/PDF ingested via Magic Inbox auto-creates a case When parsing completes and extraction confidences are available Then the risk score is computed before any agent opens the case And if the score is in the moderate band, the Step-Up task is created and prompts attached automatically And explainability logs include parsing confidences and anomaly flags used in the decision And re-parses or late-arriving data trigger a single re-evaluation with idempotent Step-Up state (no duplicate tasks) And failure to parse falls back to conservative defaults and is logged

Decision Explainability and Immutable Audit Logging

Given any risk evaluation and decision outcome When viewing the case decision details Then users can see the final score, band, decision (e.g., Trigger Step-Up), policy version, and full factor breakdown And each evaluation is written to an append-only audit log with timestamp, actor (system/admin), correlation ID, and environment And logs are retained for at least 365 days and exportable as CSV/JSON via UI And every change to thresholds, overrides, or mappings is captured with before/after values and publisher identity

Rule Preview and Backtest With Impact Estimates Before Publish

Given a draft policy with thresholds, overrides, and evidence mappings When the admin runs a preview against a selectable historical window (7–90 days) or sample size (up to 50k claims) Then the preview returns counts and rates for Low/Moderate/High, predicted Step-Up volume, and estimated change vs current policy And top 5 factor drivers for moderate decisions are displayed And previews complete within 60 seconds for ≤50k claims or continue asynchronously with progress and final notification And the system blocks publishing until a preview has been run on the draft in the last 24 hours and records the preview summary in the audit log

Dynamic Evidence Prompt Generator

"As a claimant, I want clear, tailored instructions for what proof to provide so that I can submit the right evidence on the first try."

Description

Generate context-aware, just-enough evidence requests tailored to the claim, brand, and risk segment. Prompts auto-fill model, serial, order ID, and due dates, and specify exact instructions for acceptable proofs (e.g., serial-plate photo with visible timestamp, shipping label showing name and address, or bank statement screenshot with sensitive fields redacted). Provide localized variants, tone controls, and examples to reduce confusion. Deliver prompts via email, SMS, and in-portal messaging with secure upload links, and record each prompt to the case timeline. Expected outcome: higher first-pass completion with less agent clarification.

Acceptance Criteria

Context-Aware Prompt Autofill and Instructions

Given a claim with brand, model, serial number, order ID, SLA policy, and moderate risk When a prompt is generated Then the prompt auto-fills model, serial, and order ID exactly matching case fields And the due date is computed from SLA policy and displayed in the claimant’s local time with timezone indicator and UTC ISO-8601 equivalent And the prompt specifies exactly one required proof type appropriate to moderate risk (e.g., “Photo of serial-plate with visible timestamp”) And the prompt lists at least 2 acceptable examples and at least 2 not-acceptable examples And the prompt enumerates allowed file types (JPG, PNG, PDF, HEIC), max size per file (<=25 MB), and max number of files (<=3) And when bank proof is requested, explicit redaction instructions are included (mask account number except last 4, hide unrelated transactions and balances)

Risk-Segmented 'Just-Enough' Evidence Selection

Given risk bands are defined as Low, Moderate, High, Very High And the case may already contain verified artifacts (e.g., serial photo verified) When the generator selects requested proofs Then Low requests 0 required proofs (optional clarification only) And Moderate requests 1 required proof And High requests 2 required proofs And Very High requests 3 required proofs or flags for manual review per policy And already-verified artifacts are never re-requested And requested proof types are context-relevant to channel and data gaps (e.g., shipping label if retailer order; bank statement only if order ID missing) And proof selection completes within 300 ms at p95

Multi-Channel Delivery with Secure Upload

Given claimant contact channels enabled by brand policy (email, SMS, in-portal) When the prompt is dispatched Then an individualized message is sent on each enabled channel within 60 seconds of generation with retry-once for transient failures And each message contains a single-use secure upload link bound to the case and requested artifact, expiring after 72 hours or upon first successful upload And uploads are accepted only over TLS 1.2+ for file types JPG, PNG, PDF, HEIC up to 25 MB each; files failing AV scan are rejected with a descriptive error And expired or reused links return HTTP 403 and can be refreshed via claimant self-serve without changing the due date And 7-day rolling delivery success rate is >=95% for email and >=98% for SMS to valid addresses/numbers

Localization and Tone Controls with Fallback

Given a claimant locale and a brand-configured tone (Friendly, Neutral, Formal) When the prompt is generated Then all copy, dates/times, and currency formatting are localized to the claimant locale And the selected tone is applied consistently to greeting, body, and closing using the correct template variant And placeholders (model, serial, order ID, due date) are injected with no missing or untranslated tokens And if the locale is unavailable, the system falls back to en-US, logs a warning with template ID and locale, and proceeds without blocking send

Case Timeline Logging and Event States

Given a prompt is generated and sent When it is recorded in the case timeline Then an entry is created within 500 ms containing template ID, risk band, locale, tone, channels, due date, correlation ID, and a content hash And subsequent events append statuses: sent, delivered, viewed, upload_started, upload_succeeded, upload_failed, completed, expired, canceled; each with UTC timestamp and actor And timeline entries are immutable; corrections append a new version referencing the prior entry And only authorized roles can view prompt content; all accesses are audited

First-Pass Completion and Agent Clarification Outcomes

Given a 14-day A/B test comparing Dynamic Evidence Prompt Generator (treatment) vs legacy prompts (control) on matched claim cohorts When outcomes are measured Then first-pass completion rate in treatment is >=15% higher than control with p<=0.05 And median time-to-first-upload does not increase by more than 10% And agent clarification messages per case decrease by >=25% And invalid/fraud rejection rate does not worsen by more than 2 percentage points And claimant CSAT for the prompt experience is >= baseline

Multi-Channel Evidence Capture & Parsing

"As a customer using my phone, I want to quickly send the requested photo or document from my preferred channel so that my claim isn’t delayed."

Description

Accept evidence from reply email attachments, secure mobile-friendly upload, SMS/MMS links, and agent-assisted uploads. Support JPG/PNG/HEIC images and PDFs up to defined size limits, with live validation for file type, legibility, and completeness. Extract EXIF timestamps and detect editing anomalies, OCR serial plates, parse shipping labels for name/address/postmark, and verify bank proof contains merchant/date/amount while confirming sensitive fields are redacted. Automatically associate submissions to the correct case, acknowledge receipt, and surface parsing results to the agent. Expected outcome: frictionless capture on any device and reliable automated checks that speed adjudication.

Acceptance Criteria

Email Reply Attachments Intake & Auto-Association

Given an open case with a unique inbound alias or thread token, When the claimant replies with supported attachments (JPG, PNG, HEIC, PDF) within the configured size limit, Then the system ingests the email within 60 seconds, validates file type/size, associates files to the correct case using thread id/case token and sender match, and sends an acknowledgement email within 2 minutes. Given attachments exceed size or are of unsupported type, When ingestion occurs, Then the system does not attach them, logs the reason, and responds with a secure upload link and instructions. Given an attachment duplicates a previously stored file, When a checksum match is detected, Then the system deduplicates by referencing the existing file and notes it in the case timeline.

Secure Mobile-Friendly Upload Flow with Live Validation

Given the claimant opens a secure upload link on a mobile device, When selecting or capturing files, Then client-side validation blocks unsupported file types and files exceeding the configured size limit, computes a legibility score, and warns if below the minimum threshold; submissions below threshold are prevented and a retake prompt is shown. Given required evidence types are configured for the step-up prompt, When files are selected, Then completeness checks verify all required categories are attached before enabling Submit and clearly indicate any missing items. Given successful submission, When upload completes, Then an on-screen confirmation is displayed within 2 seconds, an acknowledgement is sent via the same channel, and the link expires after the configured TTL.

SMS/MMS Evidence Capture

Given the claimant receives an SMS containing a secure upload link, When they open it, Then the responsive upload flow is presented and all validations apply; SMS consent is recorded with timestamp. Given the claimant replies with an MMS image, When the system receives it, Then the file is validated for type/size, associated to the correct case via link/session token or verified phone number match, and a confirmation SMS is sent. Given an expired or invalid link, When access is attempted, Then the system denies access, records the event, and provides a mechanism to issue a fresh link upon authenticated request.

Agent-Assisted Upload via Console

Given an agent is viewing a case in the console, When they upload evidence on behalf of the claimant, Then identical file validations and parsing are executed, the uploader is recorded as the agent with timestamp and IP, and an optional note can be attached per file. Given multiple files are uploaded, When transfer is in progress, Then per-file progress is displayed, and any failures are reported with clear reasons and retry controls without affecting successful files. Given parsing completes, When the agent remains on the case, Then results and flags for each file are immediately visible without a manual refresh.

Image Forensics: EXIF Extraction and Edit Anomaly Detection

Given an uploaded image containing EXIF metadata, When processed, Then the system extracts timestamp, device model, and geolocation (if present), normalizes timestamp to the case timezone, computes a SHA-256 content hash, and stores all in immutable metadata. Given EXIF is missing, inconsistent, or out-of-expected range relative to claim events, When processed, Then the image is flagged with reason and confidence and, if policy requires, the user is prompted for a timestamped retake. Given editing artifacts exceed the configured anomaly threshold, When detected, Then the image is marked "Possible Edited", automated acceptance is blocked, and the agent is notified with details of the anomalies.

Document Parsing and Verification (Serial Plates, Shipping Labels, Bank Proof)

Given a serial plate photo, When OCR runs, Then serial and model are extracted with ≥95% confidence or a retake is requested; extracted values are matched to product records and any mismatch is flagged. Given a shipping label image/PDF, When parsed, Then recipient name, address, carrier, and postmark/date are extracted with ≥90% confidence, the address is normalized, and a match against case records within configured tolerance is confirmed or flagged. Given a redacted bank proof image/PDF, When verified, Then presence of merchant name, transaction date, and amount is confirmed; full account/card numbers must be absent (only last 4 permissible); any unredacted sensitive fields trigger a flag and re-upload request.

Agent UI Surfacing, Actions, and SLA Effects

Given a case receives new evidence, When parsing completes, Then the case timeline shows each file with thumbnail, detected type, extracted metadata, parsed fields, confidence scores, and flags within 30 seconds of upload. Given a file has flags (e.g., low legibility, edit anomaly, data mismatch), When an agent views it, Then they can Accept, Request Reupload, or Reject with a required reason; all actions are audit-logged with user, timestamp, and before/after state. Given evidence for a step meets all checks and is accepted, When the agent confirms, Then the case step auto-advances, SLA timers update accordingly, and a templated confirmation is queued to the claimant.

Case Timeline, SLA Linking, and Auto-Followups

"As a support manager, I want prompts and responses tracked against SLAs so that cases move predictably and exceptions are visible."

Description

Track each step-up request and response as structured events on the case timeline with actor, timestamp, due date, and status (requested, pending, received, approved, rejected). Link step-up states to SLA timers, pausing or branching according to policy while preserving auditability. Schedule configurable reminder cadences and escalation paths; auto-close non-responsive cases with standardized reason codes. Expose real-time status to agents and customers, and emit events to integrations/webhooks. Expected outcome: predictable throughput, fewer stalled cases, and auditable compliance with service targets.

Acceptance Criteria

Record Step-Up Events on Case Timeline

Given a case requires additional verification When a step-up request is created by the system or an agent Then the case timeline records an event with fields: stepUpId, evidenceType, actor, timestamp (UTC ISO 8601), dueDate, status=requested And when the customer submits evidence via any supported channel Then a received event is recorded with stepUpId, submitter identity, timestamp, fileCount, totalBytes, and channel And when an agent approves or rejects the evidence Then a review event is recorded with status=approved or status=rejected, reviewerId, timestamp, and optional reason And timeline events are displayed in chronological order and are filterable by type=Step-Up And invalid status transitions (e.g., approved -> pending) are blocked with a 409 error and no timeline entry is created

SLA Pausing and Branching by Step-Up State

Given an active case with SLA policies configured When any step-up on the case has status in {requested, pending} Then the Resolution SLA is paused and a pause entry is logged with reason="Step-Up Pending" And the First Response SLA pauses or continues according to policy flag firstResponse.pauseOnStepUp And when the step-up transitions to received, approved, or rejected Then the paused SLA resumes from remaining time and the pause entry is closed with endTimestamp and duration And SLA reports exclude the pause duration from breach calculations And if multiple step-ups overlap, the total paused time equals the union of overlapping intervals (no double-counting) And all SLA state changes are visible on the case timeline and exportable

Configurable Reminders and Escalations for Step-Up Requests

Given a step-up request is pending with a dueDate and a reminder policy (e.g., 48h, 24h, 1h before due) When each reminder threshold is reached Then the system sends a reminder via the configured channels and logs a reminder event with timestamp and channel And if no response is received after N reminders (per policy) Then the case is escalated to the configured queue/role within 5 minutes, a tag is applied (e.g., "Escalated - No Evidence"), and an escalation event is logged And reminders stop immediately when the step-up is received, approved, rejected, or the case is closed And agents can snooze reminders per case for a specified duration, which is logged and respected

Auto-Close Non-Responsive Cases with Standard Reason Codes

Given a step-up request has passed its dueDate plus a configured grace period without receipt of evidence When auto-close is enabled for the policy Then the case transitions to status="Closed - No Response" with standardized reasonCode="STEP_UP_NO_RESPONSE" and closureTimestamp is recorded And the customer is sent a final notification with the standardized reason and a link to re-open or appeal if policy allows And the case cannot be auto-closed if any step-up is approved or there is an open agent follow-up task And the auto-close action is added to the case timeline and emitted as an event

Real-Time Status Visibility for Agents and Customers

Given an agent is viewing the case console and the customer is viewing the portal When any step-up status, dueDate, SLA pause/resume, reminder, escalation, or closure changes Then both UIs update within 5 seconds without manual refresh and display the new state, due countdown, and last activity time And if the update cannot be delivered in real time, the UIs fall back to polling every 15 seconds until consistency is restored And the agent UI shows a step-up checklist with statuses {requested, pending, received, approved, rejected} and next action And the customer UI shows the requested evidence types with upload controls and the exact due date/time in the user’s local timezone

Webhook Events and Delivery Guarantees

Given webhooks are configured with an endpoint and secret When step-up, SLA, reminder, escalation, or auto-close events occur Then the system emits events with types: stepUp.requested, stepUp.received, stepUp.approved, stepUp.rejected, stepUp.reminder.sent, sla.paused, sla.resumed, case.autoClosed And each payload includes: eventId (UUID), occurredAt (UTC), caseId, stepUpId (if applicable), actor, status, dueDate (if applicable), reasonCode (if applicable), and signature (HMAC) And deliveries are at-least-once with exponential backoff for up to 24 hours; idempotency is ensured via eventId And the 95th percentile delivery latency from occurredAt to first attempt is <= 10 seconds And delivery attempts and outcomes are logged and queryable by caseId and eventId

Audit Trail Integrity and Export

Given any change to step-up state, SLA pause/resume, reminder, escalation, or auto-close When the change is persisted Then an immutable audit record is written with actorId, actorType, previousState, newState, timestamp (UTC), source, and reason And audit records are append-only; edits create new records and never overwrite or delete existing entries And case exports (CSV and JSON) include the full step-up timeline with eventIds, SLA pause segments with durations, and standardized reason codes And all timestamps in exports are UTC ISO 8601 and pass schema validation

Agent Review Console with Assisted Validation

"As an agent, I want a focused view of the requested proof and automated checks so that I can make fast, consistent decisions."

Description

Provide an agent-side panel that displays submitted evidence alongside extracted claim data and risk factors. Highlight OCR-extracted serials, order IDs, names, and dates; flag mismatches and low-confidence fields. Offer one-click approve, re-request, or deny actions with templated reasons and macros; include quick annotation and redact tools. Support keyboard shortcuts and batch handling for similar cases. Feed agent decisions back to scoring/policy analytics for continuous improvement. Expected outcome: faster, more consistent adjudication with reduced cognitive load.

Acceptance Criteria

Evidence View with Extracted Data and Risk Highlights

Given a case with at least one attachment and extracted fields (Serial Number, Order ID, Customer Name, Purchase Date) When an agent opens the Review Console Then the evidence viewer displays previews of all attachments and the extracted fields with their values and confidence scores (0.00–1.00) And any field with confidence < 0.90 is visually flagged as Low Confidence And any extracted field that does not exactly match canonical data (after normalizing case, whitespace, and dashes for IDs) is flagged as Mismatch And a Risk Factors panel lists each risk factor with a severity badge (Low, Medium, High) and short description And the console renders all above elements within 2 seconds for cases with up to 5 attachments totaling ≤ 25 MB And all timestamps are shown in the agent’s timezone and ISO 8601 format

One-Click Decisions with Templated Reasons

Given the Review Console is loaded and the agent has decision permissions When the agent clicks Approve, Re-request, or Deny Then a template selector opens with the last-used template preselected And template macros including {{customer_name}}, {{order_id}}, {{serial}}, {{purchase_date}}, and {{case_link}} render correctly in the preview And confirming the action posts a timeline entry, updates the case status (Approve → Resolved-Approved, Re-request → Awaiting Customer, Deny → Resolved-Denied), and adjusts SLA timers (Approve/Den y stop, Re-request pauses until response or timeout) And an audit record is saved with decision_type, template_id, rendered_message_hash, agent_id, and timestamp And an outbound event case.decision is emitted within 5 seconds And the agent can Undo the decision within 10 minutes unless downstream fulfillment has started

Step-Up Re-Request Prompting and Tracking

Given a case has moderate risk or incomplete evidence When the agent selects Re-request and chooses Step-Up prompts from the library (Serial-plate photo with timestamp, Packaging label, Redacted bank statement, Receipt PDF) Then the system generates a customer message with a checklist and secure upload links, and sets a due date defaulted to 72 hours And the case enters Awaiting Step-Up with a visible checklist in the console showing each required item as Pending And customer uploads auto-mark checklist items Complete and notify the agent in-console And overdue items trigger an automatic reminder and keep the SLA paused up to the configured maximum pause window And all Step-Up requests and uploads are logged to the timeline with file hashes, uploader identity, and timestamps

Annotation and Redaction Tools

Given an image or PDF attachment is open in the viewer When the agent clicks Annotate Then tools for rectangle highlight, callout note, and redaction are available And saving a redaction creates a derivative file with irreversible pixel redaction while preserving the original under restricted access And the redacted derivative becomes the default for download and sharing And annotations (non-redaction) are saved as overlay metadata and can be edited or removed And an audit entry records agent_id, action, page, coordinates, and timestamp And applying or removing an annotation completes within 300 ms for files up to 20 MB

Keyboard Shortcuts and Navigation

Given the Review Console has focus and the agent is not typing in an input field When the agent presses A, R, or D Then the corresponding Approve, Re-request, or Deny flow opens with the template selector And pressing J or K navigates to the next or previous case in the active queue And pressing 1–4 switches between the first four attachments; E focuses the evidence viewer; ? opens a shortcut reference And shortcut actions trigger within 150 ms and do not conflict with browser/system defaults And behavior is consistent on the latest two versions of Chrome, Edge, and Safari

Batch Handling for Similar Cases

Given the agent selects multiple cases from a similarity cluster or filtered queue When the agent chooses Batch Re-request or Batch Approve and selects a template Then the action applies per case with macros rendered using each case’s data And cases with high risk (≥ 0.80) or missing mandatory data are auto-excluded with an explanation And processing proceeds at a minimum throughput of 20 cases per minute with a live progress indicator And a per-case success/failure summary is displayed upon completion And each case receives an audit record referencing a shared batch_id And batch actions are undoable per case within 10 minutes

Decision Feedback to Scoring and Policy Analytics

Given any decision or field override is saved When the save succeeds Then an analytics event adjudication.decision.v1 is published within 5 seconds containing case_id, decision_type, reasons, risk_score, risk_factors[], flags[], ocr_confidences{}, overrides[], agent_id, and timestamp And 99% of events are available in the analytics store within 15 minutes And the payload passes schema validation with no missing required fields And the console shows a non-intrusive confirmation badge Sent to Analytics

Privacy, Redaction, and Data Retention Controls

"As a compliance officer, I want controls that limit and purge sensitive data so that step-up verification meets privacy obligations."

Description

Enforce least-privilege data collection with built-in guidance that prompts for only the minimal artifact needed. Validate that uploaded bank proofs are properly redacted and auto-mask detected PII in images and PDFs. Provide role-based access controls to view/download evidence, watermark downloads, and full audit trails. Offer configurable retention windows and automated purge of sensitive artifacts, with region-aware storage options. Expected outcome: compliant verification that protects customer privacy while enabling effective fraud screening.

Acceptance Criteria

Minimal Artifact Prompting for Moderate-Risk Claims

Given a claim is scored as moderate risk and requires additional verification When an agent initiates Step-Up Proof on the case Then the system presents a single, just-enough evidence prompt (e.g., serial-plate photo or packaging label) with auto-generated instructions And the prompt displays purpose, required fields, and retention duration to the claimant/agent And the UI does not request or allow upload of extraneous documents or free-text PII beyond the specified artifact And the system dynamically adds additional prompts only if the prior artifact fails validation, with reasons shown And the prompt and rationale are recorded in the case audit log

Bank Proof Redaction Enforcement

Given a claimant uploads a bank proof to verify purchase When the document is analyzed server-side Then unredacted sensitive elements (full account/routing numbers, full card numbers, full street addresses, non-relevant balances/transactions) are detected And the upload is rejected with inline guidance if sensitive fields remain visible And an auto-redact preview is offered that masks sensitive fields while preserving merchant/payee, date, and amount, and last-4 digits only And the upload is accepted only if required fields are visible and sensitive elements are redacted And the validation outcome (pass/fail, reasons) is appended to the case audit

Automatic PII Masking for Images and PDFs

Given an artifact (image or PDF) is uploaded to the case When PII detection executes on the artifact and its embedded text Then detected PII types (email addresses, phone numbers, full postal addresses, bank/credit card numbers, government IDs, bar/QR codes containing PII) are masked in a derived copy And masking completes within 3 seconds per page (max 30 seconds per artifact) for artifacts up to 20 pages And the original is stored encrypted and restricted to elevated roles only, while the masked derivative is shown to standard roles And the UI indicates masking was applied and which categories were masked And the masking action is recorded in the audit log with detected types and confidence ranges

Role-Based Evidence Access Controls

Given a case contains sensitive verification artifacts When a user attempts to view or download an artifact Then only users with roles Fraud Analyst, Compliance Officer, or Supervisor can view full-resolution artifacts and download And users with Agent role can view masked thumbnails/previews only and cannot download And access control is enforced at API and UI layers with consistent decisions And each access attempt is logged with user ID, role, timestamp, case ID, artifact ID, and allow/deny outcome

Watermarked Evidence Downloads

Given a permitted user downloads a verification artifact When the file is generated for download Then a visible, non-removable watermark including user ID, case ID, timestamp (UTC), and "Confidential" is applied diagonally on every page/image at 15–20% opacity And downloads are blocked for users without download permission And the watermark remains visible after print, screenshot, or re-upload attempts And the download event is recorded in the audit with checksum of the watermarked file

Evidence Audit Trail Completeness

Given any evidence-related action occurs (collect, validate, view, mask, redact, download, retention change, purge) When the action completes Then an append-only audit record is written with fields: event type, actor ID, role, IP, timestamp (UTC), case ID, artifact ID, artifact SHA-256, before/after metadata, and result And audit entries are hash-chained to be tamper-evident and exportable as CSV or JSON And audit records are searchable/filterable by case ID, actor, event type, and date range And system clocks ensure log timestamps deviate by no more than 1 second from application events

Configurable Retention, Purge, and Region-Aware Storage

Given retention policies are configured per artifact type and region When a new artifact is stored or updated Then the calculated retention expiration is displayed on the artifact and stored in metadata And a daily purge job permanently deletes artifacts at expiration, including derivatives and CDN caches within 15 minutes, and logs the purge And legal hold flags prevent purge until cleared and are auditable And artifact storage resides in the configured region (e.g., EU, US, APAC) and is not replicated outside that region except encrypted backups within-region And API and UI return 404/"deleted" state for purged artifacts

Analytics & Policy Tuning Dashboard

"As a product operations lead, I want to measure and tune step-up policies so that we maximize legitimate approvals while reducing fraud and handle time."

Description

Deliver analytics for step-up coverage, completion rate, approval-after-step-up, fraud blocked, time-to-resolution deltas, and agent effort saved. Break down by channel, brand, product, and risk segment, and attribute outcomes to specific prompts and policies. Provide what-if simulations to preview effects of threshold or prompt changes before deploy. Export reports and emit metrics to BI via API/webhooks. Expected outcome: data-driven tuning that maximizes legitimate recoveries, reduces back-and-forth, and minimizes unnecessary friction.

Acceptance Criteria

View Step-Up Funnel Metrics Dashboard

Given I am an Operations Admin and select a date range When I open the Step-Up Analytics dashboard Then I see metrics for step-up coverage (%), completion rate (%), approval-after-step-up (%), fraud blocked (count and %), time-to-resolution delta (median and p90 hours), and estimated agent effort saved (minutes) And each metric displays numerator and denominator for the selected range And agent effort saved is computed as (messages_avoided_per_case*2min) + (auto_extraction_events*1min), summed across cases And metrics compute consistently across refreshes for the same filters And the dashboard loads the above metrics within 3 seconds for datasets up to 15,000 claims in range

Filter and Breakdown by Channel, Brand, Product, Risk Segment

Given I select a date range and any combination of filters for channel, brand, product, and risk segment When I apply the filters Then all metrics, charts, and tables update to reflect the filters within 2 seconds And I can add up to 4 dimensions for breakdown in a table or chart And grouped rows show counts and rates, hide zero rows by default, and can include zero rows when I toggle "Show zeroes" And from any grouped value I can drill down to a paginated case list showing case ID, channel, brand, product, risk segment, prompt template ID(s), policy version, outcome, and timestamps

Attribution of Outcomes to Prompts and Policies

Given cases may include multiple step-up prompts and policy versions over time When I open the Attribution tab Then outcomes are attributed using last-touch within-case attribution by default and I can switch to first-touch And each prompt template ID and policy version displays coverage, completion, approval-after-step-up, fraud blocked, TTR delta, and agent effort saved metrics And for cases with concurrent prompts, last-touch is defined as the prompt tied to the latest customer interaction before decision And the attribution model in use is clearly indicated in the UI And I can export the attribution table with the model type noted

What-If Simulation: Risk Threshold Changes

Given I select a baseline period (up to the last 90 days) and propose new risk threshold values When I run the simulation Then predicted changes are returned for coverage, completion, approval-after-step-up, fraud blocked, TTR delta, and agent effort saved, each with 95% confidence intervals And the simulation displays the sample size and methodology summary used And running the simulation does not change production settings And I can save the simulation as a draft and compare side-by-side with baseline metrics And the simulation completes within 60 seconds for up to 90 days of data

What-If Simulation: Prompt Variants

Given I select one or more prompt templates and edit copy/requirements or choose a predefined variant When I run the prompt simulation Then the system estimates impact using matched historical cohorts or prior A/B data when available And outputs predicted deltas for completion rate, approval-after-step-up, fraud blocked, and agent effort saved with confidence bands And links to representative historical cases used in the model are available for audit And I can export a proposal JSON including prompt IDs, draft copy, expected impact, and confidence

Export, API, and Webhook Delivery of Metrics

Given I select a report type and date range When I click Export Then CSV and JSON reports are generated with a documented schema, reflecting current filters and breakdowns And I can schedule daily or weekly exports to S3 or GCS over secure credentials And I can retrieve metrics via a REST API with OAuth2, filter parameters, and cursor-based pagination And I can configure webhooks to emit hourly metric deltas signed with HMAC And PII fields are excluded by default and only included when I have Admin role and explicitly enable "Include PII"; redaction is applied where configured

Data Freshness, Metric Definitions, and Auditability

Given production event ingestion is operational When I view the dashboard Then a freshness indicator shows the last update time and is no more than 10 minutes behind real time And each metric includes an inline definition with formula and inclusion/exclusion rules And I can download a metric definitions JSON for the current version And all policy and prompt changes are versioned with timestamps and user IDs And all attribution and simulations use the policy and prompt versions effective at event time

Import Scrub

Bulk pre-screening for historical or partner CSVs and inbox backlogs. Validates serials, checks OEM status, de-duplicates across your history, and returns a clean/dirty split with reasons before creating cases. Benefit: accelerates migrations and integrations while keeping bad data out of the live queue.

Requirements

Auto Schema Detection & Field Mapping

"As an integrations manager, I want to auto-map incoming CSV columns to ClaimKit fields with minimal manual work so that migrations and partner onboarding are fast and consistent."

Description

Accepts historical or partner CSV uploads and automatically detects delimiter, encoding, headers, and data types, mapping them to ClaimKit’s canonical fields (e.g., order_id, serial_number, sku, customer_email, purchase_date, warranty_policy_id, channel). Provides an interactive mapping UI with manual overrides, saved templates per source, and validation of required/optional fields. Supports field transformations (trim, case normalization, date parsing/timezone normalization), lookup tables (e.g., SKU→OEM), and sanitation rules before validation. Shows a live preview of the first N rows with validation flags. Integrates with the import job runner, case creation service, and tenant configuration. Captures mapping versions and user identity for audit. Outcome: faster onboarding and fewer mapping errors during migrations and partner integrations.

Acceptance Criteria

Auto-Detect CSV Structure and Data Types

Given a CSV file (up to 20MB, ≤200k rows) with any of comma, semicolon, tab, or pipe delimiters and UTF-8 or ISO-8859-1 encoding When the file is uploaded to Import Scrub Then the system detects delimiter, quote, escape, header presence, and encoding without manual input within 10 seconds And the system infers data types (string, integer, decimal, boolean, date/datetime with timezone, email) for each column using the first 5,000 rows And suggested mappings to canonical fields (order_id, serial_number, sku, customer_email, purchase_date, warranty_policy_id, channel) are produced with a confidence score per field And for provided test fixtures, detected structure and suggested mappings match the ground truth And if detection is ambiguous (e.g., equal confidence for two delimiters), the UI prompts the user to choose before proceeding And malformed rows are counted and flagged; detection proceeds without crashing even if up to 1% of sampled rows are malformed

Interactive Mapping UI with Manual Overrides and Saved Templates

Given the auto-detected mapping is presented When a user remaps any source column to a different canonical field or sets a column to Ignore Then the UI updates the mapping immediately and reflects changes in the live preview And the user can mark a template name and save it as a source template scoped to the tenant And the template stores: source identifier, field mappings, detected structure, transformations, lookup selections, and validation rules And template versions increment on each save with timestamp, user, and change summary And on subsequent uploads from the same source identifier, the highest version template auto-applies with the option to select an older version or None And templates are isolated per tenant; users cannot see or apply templates from other tenants

Validation of Required/Optional Fields with Live Preview

Given a mapping (auto or manual) exists When the user opens the preview Then the system displays the first 100 rows with per-cell indicators: Valid, Warning, Error And required canonical fields (order_id, serial_number, customer_email, purchase_date) must be mapped; if any are unmapped the Run Import action is disabled and a banner lists missing fields And per-row validation errors list specific reasons (e.g., invalid email format, missing serial_number, purchase_date parse failure) And the preview header shows aggregate counts: total rows, valid rows, warning rows, error rows And the user can download a CSV of rows with errors including row number and error reasons And proceeding to the import runner is blocked until required fields are mapped and there are zero critical schema errors

Field Transformations and Timezone Normalization

Given a source column mapped to a canonical field When the user configures transformations (trim whitespace, case normalization upper/lower, regex replace, date parsing with source format and timezone, numeric parsing with locale) Then the preview shows before/after values per affected cell And date/time values are converted to UTC and stored as ISO-8601 in the canonical model And if a transformation fails on a value, that cell is flagged with the specific transformation error without failing the entire row And transformation order is deterministic and documented: sanitation → lookup → parse → normalize → validate And the selected transformations are saved with the template and applied identically by the import job runner

Lookup Tables and Sanitation Rules Application

Given a SKU→OEM lookup table is configured for the tenant and selected in the mapping UI When the preview runs Then OEM values are populated from SKU via lookup before validation And any missing SKU entries are flagged with a Missing Lookup warning including the SKU value And sanitation rules are applied before lookup: serial_number non-alphanumeric stripped except dashes/underscores; customer_email trimmed and lowercased And the user can export a CSV of missing lookup values to facilitate table updates And lookup table version used is recorded in the mapping configuration

Integration with Import Runner, Case Creation, and Audit Logging

Given a mapping has zero critical validation errors When the user starts the import in Dry Run mode Then the import runner processes using the mapping configuration and returns a clean/dirty split with per-row reasons without creating cases And when the user starts the import in Commit mode Then clean rows are sent to the case creation service and cases are created; dirty rows are skipped and reported And all actions record an audit entry capturing tenant, user ID, timestamp, uploaded file hash, mapping template version, transformation settings, lookup versions, and run mode (Dry/Commit) And an administrator can view audit logs and download the mapping configuration JSON used for a specific run And all operations respect tenant isolation; data from one tenant is never accessible in another

Serial & OEM Eligibility Validation Engine

"As a support operations lead, I want serials and warranty status verified during import so that only eligible cases proceed to the queue."

Description

Validates serial numbers and warranty eligibility during the scrub phase by applying format rules and querying OEM/partner APIs or internal policy rules. Implements a connector abstraction with retries, exponential backoff, caching, and rate limiting; supports batch endpoints where available. Annotates each row with status (Eligible/Ineligible/Unknown) and structured reason codes (e.g., serial_not_found, out_of_warranty, oem_timeout), with fallback to cached responses for resilience. Integrates with vendor credential management, secrets storage, and ClaimKit’s warranty policy engine. Outcome: prevents ineligible cases from entering the live queue and reduces downstream handling time.

Acceptance Criteria

Serial Format Pre-Screening Blocks OEM Calls

Given a scrub upload containing serials that both match and fail brand-specific format rules When the validation engine runs Then any serial failing format rules is annotated with status=Ineligible and reason=serial_format_invalid And no OEM/partner API requests are attempted for serials failing format rules And serials passing format rules proceed to OEM/policy eligibility checks And a metric is recorded for the count of OEM calls avoided due to format failures

Resilient OEM Eligibility Check with Retries, Backoff, Rate Limiting, and Cache Fallback

Given a serial passes format validation and no fresh cache entry exists When the OEM API returns a timeout or 5xx error Then the engine retries up to 3 times after the initial attempt with exponential backoff (start=500ms, factor=2, max=4s) and randomized jitter (+/-20%) And the per-connector rate limit configured to 10 RPS is not exceeded during retries And if all retries fail and a cached response newer than 24h exists, the cached result is used and the row is annotated with reason=cached_hit and the cached status And if all retries fail and no fresh cache exists, the row is annotated with status=Unknown and reason=oem_timeout and is excluded from case creation in the live queue

Batch Endpoint Support and Correct Mapping

Given an OEM connector that supports batch eligibility checks up to 100 serials per request and an input of 150 serials When the validation engine runs Then the engine sends 2 batch requests (100 + 50) and receives a mixed-result response And each input serial is mapped to exactly one output result with no omissions or duplicates And partial failures in the batch are annotated per-row (e.g., serial_not_found, out_of_warranty, oem_timeout) And the final output preserves input order and includes status and reason for every row

Cache Utilization and TTL Behavior

Given a prior OEM eligibility result for a serial was cached at time t0 When the same serial is validated again at time t0+6h Then no OEM call is made and the row is annotated with the cached status and reason=cached_hit When the same serial is validated again at time t0+25h Then the OEM is queried again and the cache is refreshed with the new result And cache keys are scoped per connector and serial to avoid cross-vendor contamination

Secure Credential Use and Failure Handling

Given a connector requiring vendor credentials stored in the secrets manager When the validation engine initializes a request Then credentials are retrieved via the vendor credential management interface and never logged or exposed in error messages And all outbound calls use TLS 1.2+ and redact secrets in traces and metrics When the OEM responds 401/403 due to invalid/expired credentials Then the engine performs a single refresh attempt via the credential manager and retries the request once And if the retry fails, the row is annotated with status=Unknown and reason=invalid_credentials and no further retries are performed

Status, Reason Codes, Clean/Dirty Split, and Policy Overrides

Given OEM eligibility returns Ineligible with reason out_of_warranty and an internal warranty policy grants coverage When the validation engine applies the policy engine result Then the final annotation is status=Eligible and reason=policy_override_allow Given OEM eligibility returns Eligible and the policy engine denies coverage (exclusion) When the validation engine applies the policy engine result Then the final annotation is status=Ineligible and reason=policy_override_deny And every row includes exactly one of status {Eligible, Ineligible, Unknown} and a structured reason code from the allowed set {serial_format_invalid, serial_not_found, out_of_warranty, oem_timeout, cached_hit, invalid_credentials, rate_limited, policy_override_allow, policy_override_deny} And the Import Scrub output classifies rows as clean when status=Eligible and dirty when status ∈ {Ineligible, Unknown}, and dirty rows do not create cases in the live queue

Cross-History De-duplication & Merge Rules

"As an operations manager, I want duplicates detected and linked during bulk imports so that we avoid creating redundant cases and keep metrics accurate."

Description

Detects duplicates across the customer’s full case/claim history and within the current batch using exact and fuzzy matching on serial_number, order_id, customer identifiers, and purchase date windows. Provides configurable, per-tenant dedupe rules and confidence thresholds with actions (skip, link to existing case, or merge selected attributes). Emits reason codes (duplicate_serial, duplicate_order, potential_duplicate_low_confidence) and links new rows to canonical cases when suppressed. Integrates with ClaimKit’s case index/search for low-latency lookups and with reporting to measure suppression rates. Outcome: prevents redundant cases, preserves SLA integrity, and keeps analytics accurate.

Acceptance Criteria

Exact Duplicate by Serial/Order Across History

Given tenant T has an existing case C1 with serial_number "ABC123" and order_id "ORD-100" And Import Scrub batch contains a row R1 with serial_number "ABC123" and order_id "ORD-100" When Import Scrub runs cross-history exact match detection Then R1 is classified as an exact duplicate with reason codes including ["duplicate_serial","duplicate_order"] And suppression_action = "skip" per T's configuration And no new case is created And R1 appears in the dirty output with canonical_case_id = C1.id and link_url to C1 And the case index is not mutated

Fuzzy Match Within Batch by Customer + Purchase Window

Given a batch contains two rows R2 and R3 with: - serial_number distance <= 1 (e.g., "SN0012A" vs "SN0012B") - same customer_email - purchase_date within a 30-day window And tenant T thresholds are merge >= 0.90 and link >= 0.70 When the match score between R2 and R3 is 0.78 Then R2 is chosen as the canonical row deterministically (earliest purchase_date, else first occurrence) And R3 is classified with reason_codes = ["potential_duplicate_low_confidence"] And suppression_action = "link" to R2 (within-batch canonical) And R2 appears in the clean output and R3 appears in the dirty output with canonical_row_id = R2.id And no historical cases are created or modified

Per-Tenant Thresholds Drive Actions (Skip/Link/Merge)

Given tenant T configures dedupe rules: - exact_duplicate.action = "skip" - fuzzy.thresholds: link = 0.70, merge = 0.90 And three incoming matches produce scores of 0.65, 0.80, and 0.93 respectively When Import Scrub evaluates actions Then the 0.65 match is not auto-suppressed and is flagged with reason_codes = ["potential_duplicate_low_confidence"] in the dirty output And the 0.80 match is suppressed with suppression_action = "link" to the canonical existing case and reason_codes reflecting matched signals And the 0.93 match is suppressed with suppression_action = "merge" according to merge field rules And changing T's thresholds updates the behavior on the next run without code changes and is scoped only to T

Merge Selected Attributes Without SLA Reset

Given tenant T sets merge_fields = ["proof_of_purchase","shipping_address"] And case C2 exists for serial_number "ZX-9" with SLA timers active And row R4 matches C2 with score >= merge threshold When Import Scrub applies suppression_action = "merge" Then only the configured merge_fields on C2 are updated from R4 And C2.id, created_at, and SLA start/elapsed timers remain unchanged And an audit log entry records field-level before/after values, match_score, matched_signals, and source_row_id = R4.id And an event "case.merged" is emitted with canonical_case_id = C2.id and included in the batch results

Reason Codes and Canonical Linking in Output

Given Import Scrub suppresses rows via skip, link, or merge When generating outputs Then each suppressed row includes: suppression_action, canonical_case_id or canonical_row_id, match_score (0–1), matched_signals (e.g., ["serial_exact","order_exact","email_fuzzy"]), and reason_codes in ["duplicate_serial","duplicate_order","potential_duplicate_low_confidence"] And links are resolvable via link_url = /cases/{canonical_case_id} And rows not suppressed do not include canonical references and appear in the clean output

Low-Latency Case Index Lookup at Scale

Given tenant T has 5,000,000 historical cases indexed And an Import Scrub batch of 50,000 rows is submitted When cross-history lookups execute Then p95 per-row lookup latency <= 200 ms and p99 <= 400 ms And overall dedupe step completes in <= 15 minutes wall-clock for the batch And zero timeouts occur at the configured concurrency And metrics are recorded for p50/p95/p99 latencies per signal type

Suppression Reporting with Reason Code Breakdown

Given a completed Import Scrub run with batch_id = B123 When reporting is refreshed Then a suppression summary is available within 5 minutes containing: total_rows, clean_count, dirty_count, suppressed_count, suppressed_rate, and counts by reason_code And counts in reporting match the batch outputs exactly (zero variance) And historical reporting supports filtering by tenant_id, date range, and suppression_action And the dataset exposes fields required for SLA and analytics integrity (canonical_case_id, suppression_action, match_score)

Clean/Dirty Split Dashboard & Exports

"As a data analyst, I want a clear clean/dirty breakdown with downloadable files so that I can quickly route clean rows to creation and fix the rest."

Description

Generates a post-scrub results view summarizing total processed rows, clean vs. dirty counts, and top failure reasons, with filters and drill-down to row-level details. Produces downloadable CSVs for clean and dirty subsets, preserving original row numbers and including per-row annotations (reason codes, messages, suggested fixes). Supports pagination for large datasets, server-side filtering, and column visibility controls. Sends webhook callbacks or email notifications when a job completes for automated pipelines. Integrates with the import job runner, notifications, and audit logging. Outcome: transparency and rapid triage that accelerates migrations and partner data onboarding.

Acceptance Criteria

Ops Lead Reviews Post-Scrub Summary for 100k-Row Import

Given an import scrub job completes successfully with N processed rows, of which C are clean and D are dirty When the user opens the results view for job {job_id} Then the header displays Total Processed=N, Clean=C, Dirty=D And Clean + Dirty equals N And the Top Failure Reasons section lists the top 5 reason codes with count and percentage of the dirty subset, sorted by count descending And displayed counts and percentages match backend aggregations exactly And the view shows job_id and completed_at timestamp in the organization’s timezone

Triage Specialist Filters Dirty Rows by Reason and Drills Down to Details

Given the results include dirty rows annotated with reason_code(s), reason_message(s), suggested_fix, and original_row_number When the user applies server-side filters (e.g., status=Dirty, reason_code in [R1, R2], date range, serial contains "ABC") Then only rows matching the filters are returned by the API and displayed And filtered totals update for the current view while overall job totals remain visible and unchanged And clicking a row opens a details panel showing original_row_number, parsed fields, all reason_code/message pairs, suggested_fix, and source reference (file/email id) And the filtered results response time is ≤ 2 seconds for datasets up to 200k rows at p95

Analyst Navigates Paginated Results for Large Dataset

Given the results contain more rows than fit on one page When the user navigates pages (Next/Prev or direct page select) Then only the requested page’s rows are retrieved from the server (server-side pagination) And the default page size is 50 and can be changed to 25, 50, or 100 per page And the UI displays the correct item range and total pages based on the total matching rows for the current filter And after changing filters, the listing resets to page 1 with consistent results And p95 latency for page fetch is ≤ 1.5 seconds under expected load

User Downloads Clean and Dirty CSV Exports with Annotations

Given a completed job with clean count C and dirty count D When the user downloads the Clean export Then the CSV contains exactly C data rows plus a header row And columns include original_row_number and the standard mapped fields, plus reason_code, reason_message, suggested_fix columns left blank for clean rows When the user downloads the Dirty export Then the CSV contains exactly D data rows plus a header row And each row includes original_row_number, reason_code(s), reason_message(s), and suggested_fix populated for that row And CSVs are UTF-8 encoded, RFC 4180 compliant (quoted as needed), with a stable column order across exports And filenames follow import_{job_id}_{subset}_{YYYYMMDDHHmm}.csv And export contents reflect the entire subset (Clean or Dirty), independent of UI filters And line counts in the files match the counts shown in the summary

Pipeline Receives Webhook and Email Notification on Job Completion

Given a webhook endpoint and email recipients are configured for the organization When an import scrub job completes with status in {success, failure} Then a webhook is POSTed within 30 seconds to the configured endpoint containing job_id, org_id, status, totals {processed, clean, dirty}, started_at, completed_at, and results_url And the request includes an HMAC-SHA256 signature header computed with the shared secret And on non-2xx responses, the system retries up to 5 times with exponential backoff And an email is sent to recipients with subject including job_id and status and body containing the results_url And exactly one webhook delivery attempt sequence and one email are initiated per job outcome (no duplicates) And webhook and email outcomes are recorded in audit logs

Integration with Job Runner and Comprehensive Audit Logging

Given the import job runner transitions a job to Completed When the transition occurs Then the Clean/Dirty Split results view becomes accessible at results_url And in-progress/partial results are not accessible via the UI And audit logs record events job_completed, results_viewed, csv_exported, webhook_sent, and email_sent with actor (user or system), timestamp, job_id, outcome, and metadata And audit entries are immutable and retrievable via the audit API for compliance

User Adjusts Column Visibility Without Affecting Export Schema

Given the results table supports column visibility controls When the user toggles columns on or off and applies Reset to Defaults Then the table updates without full page reload and reflects the selection within 200 ms And the user’s column visibility selection persists for the current session And Reset to Defaults restores the system default column set And CSV export schemas remain fixed and unaffected by UI column visibility changes

Reason Taxonomy & Auto-Remediation

"As a QA specialist, I want standardized reasons and automatic fixes for common data issues so that I can reduce manual cleanup and speed up imports."

Description

Standardizes validation failures and warnings into a consistent taxonomy with machine-readable codes and readable messages. Applies safe auto-remediations during scrub (e.g., trimming whitespace, normalizing email casing, parsing varied date formats, resolving known SKU aliases) and flags rows as fixed or still dirty. Provides remediation guidance per reason and can generate a correction template for batch fixes. Tracks before/after values for transparency and allows per-tenant toggles for specific auto-fixes. Integrates with mapping, validation, and reporting layers. Outcome: less manual cleanup, higher clean rate, and faster time-to-case creation.

Acceptance Criteria

Standardized Reason Codes and Messages

Given an import scrub processes rows with validation failures and warnings When reasons are generated Then each affected row must include at least one reason_code, reason_type in {error, warning}, and human_readable_message And reason_code values are drawn from the configured taxonomy list and are unique per reason And unknown failures map to reason_code=UNKNOWN with a non-empty message And multiple reasons can be attached to a single row without duplication

Auto-Remediation of Common Data Issues

Given rows with leading/trailing whitespace, mixed-case emails, and date strings in supported formats When scrub runs with auto-remediation enabled Then whitespace is trimmed on string fields configured for trimming And emails are normalized to lowercase And dates are parsed and normalized to ISO 8601 (YYYY-MM-DD) And fixed fields are marked with remediation_applied listing rule names And rows with all issues resolved are classified as clean and flagged fixed=true And unresolved issues leave the row classified as dirty with appropriate reasons

SKU Alias Resolution

Given a tenant with a configured SKU alias map When a row contains a known alias Then the alias is replaced with the canonical SKU and before/after values are captured And if an alias maps to multiple candidates, no replacement occurs and reason_code=SKU_ALIAS_AMBIGUOUS is added And if an alias is unknown, reason_code=SKU_UNKNOWN is added And when auto_fix_sku_aliases=false, no replacement occurs and the appropriate reason remains

Per-Tenant Auto-Fix Toggles

Given tenant-level settings for specific auto-remediations When auto_fix_whitespace=false and auto_fix_dates=true for the tenant Then whitespace is not trimmed and reason_code=WHITESPACE_TRIMMABLE is added where applicable And dates are normalized and remediation_applied includes DATE_NORMALIZED And changes to toggles take effect on the next scrub execution without redeploy or code change

Before/After Audit Trail and Transparency

Given fields are modified by auto-remediation during scrub When viewing scrub results via API or UI Then each modified field shows before_value, after_value, remediation_rule, timestamp, and actor=system And audit entries are immutable and exportable And no before/after pair is recorded for rows with no changes And audit logs are retained for at least 90 days

Correction Template Generation and Re-Import

Given a scrub run produces dirty rows When a correction template is generated Then the template includes only the fields required to fix, plus row_id and reason_codes per row And the template can be downloaded within 5 seconds for up to 50,000 dirty rows And re-importing a completed template updates the original rows and reduces the count of corresponding reasons And rows that remain invalid after re-import retain or update their reason codes accordingly

Clean/Dirty Split, Counts, and Reporting Integration

Given a scrub run over a CSV with mixed data quality When results are produced Then output includes separate clean and dirty sets, with total counts and percentage clean And fixed rows are included in clean with fixed=true And an aggregated breakdown of reasons is emitted per reason_code for reporting APIs And mapping/validation/reporting layers receive reason codes for downstream metrics

Controlled Case Creation (Dry Run to Commit)

"As a support lead, I want to review scrub results and then safely create cases in controlled batches so that I avoid flooding the live queue and maintain auditability."

Description

Implements a two-phase flow where users scrub first (dry run) and then commit case creation for the clean subset with explicit confirmation. Enforces guardrails such as maximum create thresholds, exclusion of selected rows, and chunked batch creation with idempotency keys, retries, and progress tracking (pause/resume). Backfills SLA timers and embeds source metadata on created cases. Emits audit logs and notifications for compliance and traceability. Integrates with the Case Creation service, SLA engine, and activity logs. Outcome: safe, observable bulk creation that avoids flooding the live queue and maintains data integrity.

Acceptance Criteria

Dry Run Produces Clean/Dirty Split with Reasons

Given a user uploads a CSV or selects an inbox backlog for scrub When the user starts a dry run Then the system validates all rows without creating any cases And returns total rows, clean count, and dirty count And attaches per-row reason codes for all dirty rows (e.g., invalid serial, ineligible OEM, duplicate) And assigns a unique dry-run ID And persists results for at least 7 days And exposes downloadable clean.csv and dirty.csv with consistent headers and a reasons column

Commit Confirmation and Safe Case Creation

Given a dry-run ID exists with one or more clean rows When the user clicks Commit and confirms in a modal summarizing cases-to-create and risks Then cases are created only for the current clean, non-excluded rows via the Case Creation service And each created case includes source metadata (dry-run ID, import source, file name or message ID, original timestamps, idempotency key) And SLA timers are backfilled via the SLA engine based on source timestamps And the live queue reflects only the newly created cases And no duplicate cases are created across retries or re-runs using the same dry-run ID

Maximum Create Threshold Guardrail

Given the workspace has a maximum create threshold configured When a commit would create more cases than the threshold Then the system blocks the commit and creates zero cases And displays an error with the attempted count and the configured threshold And offers options to split the commit into smaller batches below the limit

Row Exclusion Respected During Commit

Given the user has excluded selected rows from the clean subset prior to commit When the user commits case creation Then excluded rows are not created as cases And a persistent exclusion report stores excluded row identifiers and user/automation that performed the exclusion And the commit summary shows created = clean − excluded

Chunked Batch Creation with Idempotency and Retries

Given a commit has started When creating cases Then the system processes rows in batches of a configurable size (e.g., 100) And assigns a deterministic idempotency key per source row And retries transient failures up to the configured limit with exponential backoff And guarantees at-least-once submission with exactly-once case creation via idempotency And produces a per-row outcome report including success, retry count, and final error (if any)

Pause/Resume with Accurate Progress Tracking

Given a commit job is in progress When the user clicks Pause Then the current batch finishes, subsequent batches halt, and the job status becomes Paused And progress metrics (total, processed, succeeded, failed, remaining, percent) persist accurately When the user clicks Resume Then processing continues from the next unprocessed row without duplicating previously created cases And ETA updates based on recent throughput

Audit Logs and Notifications for Compliance

Given a dry run or commit is executed When key events occur (dry-run started/completed; commit started/paused/resumed/completed/failed) Then an immutable audit log entry is recorded with actor, timestamps, job IDs, counts, and parameters And completion notifications with summary counts and links to downloadable reports are sent to configured channels (e.g., email, Slack) And audit entries are queryable via Activity Logs for at least 1 year

OEM Sync

High-availability aggregation and smart caching of OEM warranty/serial databases with model normalization and fallback logic when OEM APIs are slow or down. Benefit: near-instant eligibility checks, fewer false negatives from outage gaps, and a consistent experience for agents and customers.

Requirements

Multi-OEM Connector Framework

"As a platform engineer, I want a standardized connector layer for many OEM systems so that ClaimKit can reliably integrate and scale without custom one-off code per OEM."

Description

Build a resilient integration layer that connects to multiple OEM warranty/serial systems (REST/SOAP/GraphQL, SFTP dumps, webhooks), handles diverse auth schemes (API keys, OAuth2, mTLS), and normalizes inbound schemas into ClaimKit’s canonical contract. Include adaptive rate limiting, exponential backoff with jitter, idempotent requests, and per-OEM versioning to tolerate API changes. Secrets are stored in the platform vault, with rotation support. Supports both real-time lookups and incremental sync jobs, enabling ClaimKit’s magic inbox and live queue to query eligibility uniformly across OEMs.

Acceptance Criteria

Real-Time Eligibility Lookup Returns Canonical Response

Given an OEM connector is configured and reachable And a request includes serial_number and model_identifier When ClaimKit performs a real-time eligibility lookup Then the connector returns a canonical response with fields: serial_number, model_id, model_name, warranty_status ∈ {eligible, ineligible, unknown}, coverage_start_date, coverage_end_date, eligibility_reason, oem_code, oem_api_version, correlation_id And enum values are normalized to ClaimKit’s allowed set And the response P95 latency is ≤ 800 ms when the OEM’s P95 latency is ≤ 500 ms, measured over 1,000 requests And responses include a stable correlation_id and are JSON schema-valid against the canonical contract

Vault-Backed Auth and Seamless Secret Rotation

Given an OEM requiring API Key authentication And the API key is stored in the platform vault When the key is rotated in the vault Then subsequent requests use the new key within 60 seconds without process restart And no more than one 401/403 occurs during rotation per OEM Given an OEM requiring OAuth2 client-credentials When the access token is near expiry Then the connector refreshes proactively and does not send an expired token And secrets are never logged; logs contain only redacted placeholders Given an OEM requiring mTLS When the client certificate is renewed in the vault Then the connector hot-reloads the certificate without downtime and completes successful TLS handshakes on the next request

Adaptive Rate Limiting and Backoff with Jitter

Given the OEM responds with HTTP 429 and a Retry-After header When requests exceed the OEM’s published limits Then the connector reduces request rate below the limit within 10 seconds And honors Retry-After before retrying And uses exponential backoff with full jitter with a maximum backoff of 60 seconds And no request is retried more than 5 times And the connector emits structured metrics (requests_per_second, http_429_count, backoff_seconds) per OEM

Idempotent Requests and Duplicate Suppression

Given a network timeout occurs after the OEM has received a request When ClaimKit retries with the same idempotency_key Then the OEM is not invoked twice for a side-effecting call (via OEM idempotency headers or connector-level deduplication) And the connector returns the same response body and status for repeated idempotency_key values for 24 hours And batch sync processing deduplicates records by (oem_code, external_id) with ≥ 99.99% deduplication accuracy validated on a 100k-record test set

Per-OEM API Versioning and Safe Rollout

Given an OEM offers API versions v1 and v2 and both mappings to the canonical contract exist When an operator switches the OEM’s configured version from v1 to v2 Then the connector begins using v2 within 5 minutes without redeploy And can roll back to v1 within 5 minutes And both versions can run in parallel behind a percentage rollout flag from 0% to 100% And responses include oem_api_version indicating the upstream version used And contract tests pass against recorded fixtures for both versions

Incremental SFTP Delta Ingestion

Given an OEM drops nightly delta files to SFTP with naming pattern delta_YYYYMMDD.csv When the scheduled job runs Then the connector connects with key-based auth, lists only new files, and downloads exactly once using checkpointing And verifies file integrity via checksum before processing And parses, maps to the canonical contract, and upserts records with at-least-once semantics and an idempotent merge keyed by (serial_number, model_id, oem_code, effective_date) And partial failures are retried with exponential backoff and errored records are written to a DLQ with reason codes And job metrics (files_processed, rows_upserted, rows_skipped_duplicate, failures) are emitted per OEM

Webhook Subscription, Verification, and Ordering Guarantees

Given an OEM sends webhook events for warranty updates And the connector is configured with the OEM’s signing secret or public key When events are received Then the connector verifies signatures, rejects unverifiable events with 401, and does not process them And deduplicates events by event_id for 24 hours And enforces per-serial ordering using sequence numbers or timestamps with deterministic tie-breakers And retries transient failures with exponential backoff and moves poison events to a DLQ after 5 attempts And P95 end-to-end processing latency from receipt to canonical upsert is ≤ 2 seconds under 100 RPS sustained

Smart Eligibility Cache

"As a support agent, I want instant eligibility checks so that I can resolve claims quickly without waiting on slow OEM systems."

Description

Implement a low-latency, OEM-aware cache for serial/model eligibility results with configurable TTLs, staleness windows, and per-OEM invalidation rules. Support write-through and cache-aside patterns, proactive warmups for high-volume SKUs, and background refresh to keep hot entries fresh. Enforce deterministic cache keys (OEM+model+serial+purchase signals) and attach provenance and timestamps for audit. Target sub-200ms p95 eligibility checks from ClaimKit’s live queue, drastically reducing perceived latency and shielding agents from OEM slowness.

Acceptance Criteria

Deterministic Cache Key Generation and Canonicalization

Given OEM, model, serial, and purchase signals are provided in varying cases, whitespace, and punctuation, When generating a cache key, Then the same deterministic key is produced across repeated calls and nodes. Given any change in OEM, model, serial, or purchase signals, When generating a cache key, Then a different key is produced. Given 1,000,000 distinct OEM+model+serial+purchase-signal tuples, When generating keys, Then collisions equal 0. Given a request to generate a key, When executed on standard application nodes, Then p95 key-generation time is <= 2ms.

Sub-200ms p95 Eligibility Check from Cache

Given a cache HIT for an eligibility request, When responding from cache, Then end-to-end latency is <= 200ms p95 and <= 400ms p99 over 10,000 requests from the live queue. Given the OEM API is slow (>= 3s) or returns 5xx, and a cached entry is fresh or within staleness, When processing a request, Then the response is served from cache in <= 200ms p95 and includes cache.hit=true and cache.stale in {true,false}. Given no cached entry exists and the OEM API latency exceeds the configured timeout, When processing a request, Then the system returns a fast degraded response within <= 200ms indicating status=deferred and enqueues a background fetch.

Per-OEM TTL and Staleness Window Enforcement

Given OEM A has TTL=24h and staleness=72h, When a cached entry age > 24h and the OEM API is unavailable, Then the system serves the entry with cache.stale=true and schedules a background refresh. Given a cached entry age > staleness window, When processing a request, Then the entry is not served and a live fetch is attempted; if unsuccessful, a defined error/deferred status is returned. Given a change to OEM A's TTL/staleness configuration, When updated in the config store, Then all nodes apply the change within 5 minutes without restart. Given an invalidation command for OEM A (key pattern or full flush), When executed, Then matching entries are removed within 1 minute and subsequent requests are cache misses.

Cache-Aside Read Miss and Write-Through Update Behavior

Given a cache miss, When an OEM lookup succeeds, Then the result is stored in cache before responding and the subsequent identical request is a cache HIT. Given a cache miss and the OEM lookup returns a definitive negative (ineligible/not found), When handling the response, Then the negative result is cached with the per-OEM negative TTL. Given an authoritative eligibility override is written by an operator or automation, When persisted, Then the cache is updated synchronously (write-through) and subsequent reads reflect the override. Given repeated identical writes for the same key, When processed, Then the resulting cache state is unchanged and no duplicate refresh jobs are enqueued (idempotent).

Proactive SKU Warmup Coverage and Safety

Given a configured top-N SKU/key list and a warmup schedule, When the warmup job completes, Then >= 95% of targeted keys have entries with age < 20% of TTL within 10 minutes. Given OEM-specific rate limits, When warmup runs, Then per-OEM QPS does not exceed configured limits and HTTP 429 rate remains < 1%. Given warmup traffic, When observing live queue performance, Then cache HIT p95 latency remains <= 200ms and application CPU utilization remains < 70%. Given warmup failures, When retries occur, Then exponential backoff with jitter is used and each failure is logged with OEM, cache_key, attempt number, and error code.

Background Refresh Keeps Hot Entries Fresh Without Impact

Given a key with hit_count >= threshold in the last 15 minutes and age >= 80% of TTL, When a foreground request arrives, Then a background refresh is scheduled without blocking the foreground response. Given a successful background refresh, When it completes, Then the entry's last_refreshed_at and expiry are updated; Given a failed refresh, Then the previous value remains available within staleness and a retry is scheduled per backoff policy. Given many hot keys across OEMs, When background refresh runs, Then per-OEM concurrency is capped to configured limits and foreground p99 latency remains <= 400ms.

Provenance and Audit Metadata Attached to Cache Entries

Given a cached eligibility is returned, When inspecting the response, Then it includes provenance fields: cache_key, source (oem|warmup|override), first_seen_at, last_refreshed_at, ttl_seconds, staleness_seconds, freshness_status, and value_hash. Given any cache entry is created, refreshed, overridden, or invalidated, When querying the audit log, Then an immutable record exists with UTC ISO8601 timestamp, actor, action, OEM, cache_key, and before/after hashes. Given multiple nodes write audit events, When events are read, Then timestamps are monotonic within <= 100ms skew ensuring correct ordering by time.

Model & Serial Normalization Engine

"As an operations lead, I want normalized model and serial data so that eligibility decisions are consistent across OEMs and channels."

Description

Provide a normalization service that maps OEM-specific model/serial formats to canonical product identities with fuzzy matching, pattern libraries, and rule-based transforms (trimming, OCR correction, checksum validation). Maintain a curated alias table and confidence scoring to reduce false negatives from minor variations. Expose APIs to ClaimKit workflows so that incoming emails/PDFs and queue lookups use consistent normalized identities for decisions and SLA timers.

Acceptance Criteria

OCR-Derived Serial Cleanup and Canonical Mapping

Given an OCR-extracted model/serial string containing whitespace, punctuation, and case variance When it is submitted to POST /normalize with an oemHint Then the service trims whitespace, strips disallowed characters per OEM pattern, normalizes case, and returns normalized.model and normalized.serial Given input " xr-55a80j-1234 " and an alias mapping "XR-55A80J" -> canonicalProductId "P-1001" When the request is processed Then response.canonicalProductId = "P-1001", response.confidence >= 0.95, response.rulesApplied includes ["trim","punctuation-stripping","case-normalization"], and processingTimeMs <= 150 Given OCR-confusable characters (O/0, I/1, S/5) When correction rules generate candidates Then the engine prefers candidates that pass checksum validation and returns the highest-confidence passing candidate; if none pass, no match is returned with reason = "checksum_failed"

Fuzzy Model Alias Resolution with Confidence Scoring

Given the alias table maps variants {"A12B-3","A12B3","A-12BIII"} to canonical "A12B-3" When any variant is submitted Then response.canonicalModel = "A12B-3" and response.confidence >= 0.90 Given a model string within Levenshtein distance <= 2 of a known alias When normalized Then accept the match if distance-weighted confidence >= 0.85; else if confidence >= 0.70 set reviewRequired = true; else return no match Given multiple candidates above acceptance threshold When normalized Then apply deterministic tie-breakers (alias priority > newest alias > lexicographic) and return the selected candidate with tieBreaker recorded

Serial Checksum Validation and Error Handling

Given an OEM pattern with checksum algorithm Mod11 When a serial that fails checksum is submitted Then response.invalidSerial = true, response.errorCode = "CHECKSUM_FAIL", response.canonicalProductId is absent, and HTTP 200 is returned with outcome = "rejected" Given a serial whose length is outside the OEM-allowed range When normalized Then response.errorCode = "SERIAL_FORMAT_INVALID" and invalidSerial = true Given a serial that passes checksum and length validation When normalized Then response.invalidSerial = false and checksumValidated = true

Normalization API Performance and Availability

Given a steady load of 500 requests/second with median payload size 1 KB When observed over a 24-hour window in production Then p95 latency <= 150 ms and p99 <= 300 ms for POST /normalize Given a rolling 30-day period When monitoring uptime for /normalize, /match, and /aliases Then availability >= 99.9% excluding pre-declared maintenance windows Given a cold cache for a new OEM pattern When the first 100 requests are processed Then cache hit ratio >= 80% within 5 minutes and no single request exceeds 500 ms during warmup

Fallback Behavior During OEM API Outage

Given the OEM API is degraded or unreachable When normalization requires OEM metadata Then the engine serves from cached pattern libraries and alias tables without blocking and sets response.source = "cache" if confidence >= thresholdHigh Given an OEM outage exceeds 30 minutes When normalization occurs Then stale cache entries beyond TTL may be used up to maxStale = 24h with response.stale = true Given OEM API recovery When subsequent normalization requests are processed Then cache entries touched during outage are refreshed lazily without breaching p95 latency targets

Idempotent and Deterministic Normalization Results

Given the same input payload is submitted multiple times within 24 hours When normalized Then outputs (canonicalProductId, confidence, rulesApplied) are identical and response.requestHash remains constant Given the ruleset version increments When a request specifies rulesVersion = X Then identical inputs with the same rulesVersion yield identical outputs; when no version is specified the latest version is applied and echoed in response.rulesVersion Given normalization is performed across multiple replicas and regions When identical inputs are processed Then deterministic tie-breakers ensure no cross-node variance (verified by identical response checksum)

Outage Detection & Fallback Logic

"As a customer support manager, I want automatic fallback during OEM outages so that my team can keep processing claims without disruption."

Description

Introduce health checks, per-OEM timeouts, and circuit breakers to detect slow or failing OEM APIs. When degradation occurs, route lookups to the Smart Eligibility Cache within allowed staleness thresholds, return best-known results, and queue reconciliation jobs for when the OEM recovers. Provide clear flags back to ClaimKit UI and automations indicating degraded mode, ensuring agents have a consistent experience and that SLAs continue without unnecessary false negatives.

Acceptance Criteria

Circuit breaker opens on OEM degradation

Given OEM=A has breaker config {timeoutMs:1500, tripConsecutiveTimeouts:5, tripErrorRate:0.5, errorWindowSize:20, openSeconds:60} And the last 5 requests to OEM=A exceeded 1500ms or 50% of the last 20 requests resulted in 5xx/timeouts When a new eligibility lookup for OEM=A is initiated Then the circuit breaker state for OEM=A transitions to OPEN for 60 seconds And the lookup is short-circuited without issuing a network call to OEM=A And an event oem.breaker.open is emitted with {oemId:"A", reason:"timeouts_or_error_rate", openSeconds:60} And metrics for breaker_open_count and breaker_state{oemId:"A"} are updated

Per-OEM request timeout enforcement

Given OEM=B has request timeout configured to 3000ms When an eligibility lookup is sent to OEM=B and the OEM does not respond within 3000ms Then the request is aborted client-side at 3000ms And the attempt is recorded as timeout in request telemetry And the response pipeline treats the attempt as a failure eligible for circuit-breaker evaluation

Fallback to Smart Eligibility Cache within staleness threshold

Given OEM=C is in OPEN or HALF_OPEN breaker state or a timeout/error is returned And Smart Eligibility Cache contains a record for serial=SN123 with cacheAgeMinutes=45 And OEM=C has staleness_max_minutes configured to 120 When an eligibility lookup for SN123 is requested Then the system returns the cached eligibility result with source:"cache" And degradedMode:true and staleness:"within_threshold" are set on the response And cacheAgeSeconds reflects the age of the cached record And no denial workflows are triggered solely due to degraded mode

Handling cache beyond staleness threshold during OEM outage

Given OEM=D is unreachable (timeouts) and breaker is OPEN And Smart Eligibility Cache has a record for serial=SN999 with cacheAgeMinutes=240 And OEM=D has staleness_max_minutes configured to 120 When an eligibility lookup for SN999 is requested Then the system returns a provisional response with verdict:"unknown" and source:"cache" And degradedMode:true and staleness:"beyond_threshold" are set on the response And a reconciliation job is enqueued with {oemId:"D", serial:"SN999"} And denial automations are suppressed; SLA timers continue

Reconciliation job enqueue and processing after OEM recovery

Given a reconciliation job exists for {oemId:"E", serial:"SN456"} created during degraded mode And the circuit breaker for OEM=E transitions from OPEN/HALF_OPEN to CLOSED When the reconciliation worker runs Then the job is dequeued and a live lookup to OEM=E is attempted And on success, the case is updated if the OEM result differs from the cached/provisional result And an audit trail entry is written with {previousResult, newResult, source:"reconciliation"} And a webhook/event eligibility.reconciled is emitted with jobId and correlation identifiers And jobs are idempotent: reprocessing the same job does not create duplicate updates or events

Degraded-mode flags exposed to UI and automations

Given an eligibility response was produced using cache due to OEM outage or timeout When the response is returned to ClaimKit services Then the payload includes fields {degradedMode:true, source:"cache"|"oem", staleness:"within_threshold"|"beyond_threshold", cacheAgeSeconds:int} And if a reconciliation job was enqueued, reconciliationJobId is populated with a UUID And these fields are persisted on the case and available via UI and API within 1 second of response

SLA continuity and false-negative prevention during degraded mode

Given an eligibility lookup occurs while OEM=F is degraded and fallback is used When the returned verdict is sourced from cache or marked unknown due to staleness Then SLA timers for the case start/continue as per normal policy And any automation that would deny a claim requires source:"oem" with degradedMode:false And no claim is auto-denied solely on a cache-sourced negative verdict during degraded mode

Consistency & Conflict Resolution

"As a compliance analyst, I want transparent conflict resolution with audit trails so that I can explain eligibility outcomes to stakeholders and auditors."

Description

Create a decision layer that merges data from multiple OEM sources and historical cache entries using freshness, source trust weighting, and confidence scores. Persist provenance and an audit trail for every decision, with deterministic tie-breakers and manual override hooks. Ensure ClaimKit surfaces a single, authoritative eligibility result to agents while retaining traceability for compliance and post-mortem analysis.

Acceptance Criteria

Merge multi-source eligibility into single decision

Given a claim lookup with serial and model and multiple OEM sources plus cached entries And per-source trust weights and freshness thresholds are configured When the decision engine evaluates all inputs Then it returns exactly one eligibility.status in {ELIGIBLE, INELIGIBLE, INCONCLUSIVE} And includes chosen_source, confidence_score (0.00–1.00), decision_id, and reason_code And p95 decision computation time (excluding external calls) is ≤ 50ms

Deterministic conflict resolution and tie-breakers

Given two or more sources produce conflicting eligibility results for the same serial When confidence_score is computed as trust_weight × freshness_factor per source Then the source with the highest confidence_score is selected And if confidence_scores tie, the source with higher trust_weight is selected And if still tied, the source with the most recent updated_at is selected And if still tied, the source with lexicographically smallest source_key is selected And the selected tie_break_path is recorded in the decision audit

Outage fallback and stale-cache handling

Given OEM API calls time out (>2000ms) or return 5xx And a cache entry exists with age ≤ 24h When a decision is requested Then the cached entry is used and eligibility.status is returned with reason_code=OUTAGE_FALLBACK and provenance.cache_age And p95 response time is ≤ 800ms And if only a cache entry with 24h < age ≤ 72h exists, return status INCONCLUSIVE with reason_code=STALE_CACHE and enqueue background verification retry within 15 minutes And if no cache entry exists, return INCONCLUSIVE with reason_code=NO_DATA and create a verification task

Provenance and audit trail persistence

Given any decision (engine or override) is produced When the decision is persisted Then an immutable audit record is stored with: decision_id, request_id, timestamp, actor (system|user), per-source input snapshot/hash, computed confidences, chosen_source, tie_break_path, final status, and reason_code And the audit record is retrievable via API GET /decisions/{decision_id} and linked to the claim And audit records are write-once; updates create a new version with parent_decision_id And audit records are retained for ≥ 24 months and exportable as JSON within 5 seconds per decision

Manual override precedence and traceability

Given an authorized user submits an eligibility override with status {ELIGIBLE|INELIGIBLE}, reason, and optional expiration When the override is saved Then the override takes precedence over engine decisions until expiration or explicit revoke And subsequent API/UI reads return the overridden status with reason_code=MANUAL_OVERRIDE and override metadata (actor, timestamp, reason, expires_at) And an audit entry records before_status, after_status, actor, timestamp, and justification And revoking the override restores engine decisions within 60 seconds

Agent-facing single-result presentation with explainability

Given an agent opens a claim in the UI with an existing decision When the eligibility chip renders Then exactly one value is shown: {Eligible|Ineligible|Inconclusive} And a "View decision details" action displays chosen_source, confidence_score, cache_age, and a summary of tie_break_path without calling OEM APIs And decision details load in ≤ 300ms p95 from the decision store And status changes (override, re-evaluation) propagate to the UI within 60 seconds

Admin Controls & Observability

"As a site reliability engineer, I want visibility and controls for OEM Sync so that I can detect issues early and remediate them quickly."

Description

Deliver dashboards and APIs to monitor per-OEM latency, uptime, error rates, cache hit ratios, circuit breaker states, and refresh queues. Include alerting on SLO/SLA breaches, manual re-sync triggers, maintenance windows, blocklists/allowlists, and per-OEM configuration (TTLs, timeouts, weights). Integrate with ClaimKit’s admin panel so operators can safely tune behavior and recover from incidents without code changes.

Acceptance Criteria

Per-OEM Metrics Dashboard & API

- Given an authenticated admin selects an OEM and a time window (last 1h, 24h, 7d), when the metrics dashboard loads, then the page renders within 5 seconds and shows latency p50/p95/p99, uptime %, error rate %, request volume, and cache hit ratio with timestamps - Metrics data freshness is <= 60 seconds for real-time windows and <= 5 minutes for 7d window - The metrics API GET /admin/oems/{oemId}/metrics returns JSON with fields: latency_ms {p50,p95,p99}, uptime_percent, error_rate_percent, request_count, cache_hit_ratio, time_window; response time <= 800 ms for cached queries - Access control: non-admin or missing scope receives HTTP 403 with no body content; all access attempts recorded in audit log with user, time, endpoint, and result - Timezone and aggregation boundaries are consistent between UI and API (no >1% discrepancy in counts over same window)

Cache Hit Ratios, Circuit Breakers, and Refresh Queues Visibility

- Dashboard displays per-OEM: cache_hit_ratio, miss_ratio, stale_hit_ratio, circuit_breaker_state (open/half-open/closed), state_since timestamp, and refresh_queue_depth with 50th/95th item age percentiles - Data latency for these widgets is <= 30 seconds - Circuit breaker reasons and last 5 transitions are viewable with timestamps - API GET /admin/oems/{oemId}/resilience returns JSON including breaker_state, reason, transitions[], refresh_queue_depth, age_p50_ms, age_p95_ms; response time <= 800 ms - When breaker is open, UI shows a red state and an info tooltip linking to the runbook; when closed, green; half-open is amber

SLO/SLA Breach Alerting

- Admin can configure per-OEM SLOs: uptime target (%) over 30d and p95 latency threshold (ms) over 1h; validation prevents invalid ranges and saves require confirmation - When an SLO is breached, an alert is emitted within 2 minutes containing OEM, metric, threshold, observed value, window, severity, and runbook link - Alert delivery supports Email, Slack, and PagerDuty; each channel can be enabled/disabled per OEM; delivery success/failure is logged - Alerts are deduplicated (no more than one per metric per OEM per 15 minutes) and auto-resolve when metrics are back in compliance for 10 continuous minutes - During an active maintenance window, SLO alerts for the affected OEM are suppressed

Manual Re-Sync Trigger with Safe Execution and Audit

- Admin can enqueue a re-sync job per OEM with scope options: all, by model number(s), by date range; form validates scope and estimates job size before submission - Upon confirmation, the job appears in a queue within 60 seconds with status queued/running/succeeded/failed and progress (%) and counts (processed, succeeded, failed) - Job respects per-OEM rate limits and backoff; if circuit breaker is open, the job pauses and resumes automatically when half-open/closed - Failures generate retriable tasks up to configured retry policy; final failures produce an error report downloadable as CSV - An immutable audit record captures user, timestamp, scope, parameters, pre-run cache stats, post-run stats, and job outcome

Maintenance Windows Management and Enforcement

- Admin can schedule per-OEM maintenance windows with start/end, timezone, and optional recurrence (cron-like); validation prevents overlaps and past-only windows - During an active window, outbound OEM calls are paused, cache serves stale content up to stale_ttl, SLO alerts are suppressed, and SLA timers for eligibility checks are paused - UI shows an active maintenance banner per affected OEM; API GET /admin/oems/{oemId}/maintenance returns current and next windows - Traffic resumes automatically at window end with a configurable warm-up rate (requests/sec) until normal throughput is reached - All maintenance window changes are audited and require confirmation prior to activation

Per-OEM Configuration: TTLs, Timeouts, Weights, and Retries

- Admin can edit per-OEM: cache TTLs (fresh_ttl, stale_ttl), request timeout (ms), retry policy (max_retries, backoff strategy), and routing weight; inputs validated against safe ranges - Changes require a review step showing before/after values, blast radius summary, and an acknowledgement checkbox; save is blocked until acknowledgement - Configuration changes take effect without deployment within 2 minutes and do not interrupt in-flight requests - A one-click rollback restores the previous version; version history lists user, timestamp, diff, and rollout status - RBAC enforces that only users with Config:Write scope can modify; others see read-only fields

Blocklists and Allowlists for Models and Serials

- Admin can create per-OEM allowlists and blocklists for model numbers, SKUs, and serial patterns (exact or regex); patterns are validated and tested with a built-in tester before save - Enforcement: blocked queries never call OEM APIs and return a standardized error code (CK-ELIG-Blocked) with a user-safe message; allowed patterns bypass blocks as configured - UI displays match counts over the last 24h for each rule; rules can be enabled/disabled without deletion - API GET /admin/oems/{oemId}/filters returns active rules with id, type, pattern, enabled, created_by, created_at; changes are audited - Performance: rule evaluation adds <2 ms p95 overhead per request

Risk ETA

Per‑case breach prediction that shows an expected time‑to‑breach, confidence band, and top drivers (e.g., queue load, parts wait, customer silence). Updates in real time inside the case header and queue views. Helps Ops Orchestrators and Agents triage accurately, set honest expectations, and prevent surprises.

Requirements

Per‑Case Breach Time Prediction

"As an Ops Orchestrator, I want to see an expected time to SLA breach for each case so that I can prioritize intervention on the cases most at risk."

Description

Build and deploy a predictive service that computes expected time‑to‑breach (ETB) for each active case based on live operational and case signals. The service must output ETB in minutes, current breach probability within configurable horizons (e.g., 2h, 8h, 24h), and a risk score normalized 0–100. Predictions should refresh in near real time (sub‑minute where signals change; max 5‑minute refresh otherwise) and support cold‑start cases via rules‑based fallbacks. The service must integrate with ClaimKit’s existing SLA timers, handle multi‑tenant isolation, and respect per‑tenant data boundaries. Non‑functional targets: P95 prediction latency under 300 ms per batch of 100 cases, 99.9% availability, and idempotent re‑computations. Backfill predictions for all open cases on feature enablement.

Acceptance Criteria

Live ETB Refresh on Signal Change

Given an active case with SLA timer running and the prediction service enabled for the tenant When any tracked signal for the case changes (e.g., queue position, parts ETA, customer response) Then the service recomputes and publishes ETB (minutes), risk score (0–100), and configured horizon breach probabilities within 60 seconds of the signal change And when no tracked signals change, the service refreshes predictions at least once every 5 minutes And the case header and queue views receive and display the new values within 10 seconds of publish And each prediction includes an updated_at timestamp (ISO 8601) and monotonically increasing version number

Required Output Fields and Ranges

Given a prediction output delivered via API or stream for a case Then the payload includes: etb_minutes (integer, >= 0), risk_score (integer, 0–100 inclusive), breach_probabilities (map of configured horizons to [0.0–1.0] floats), updated_at (ISO 8601), mode ("ml"|"rules"), and version (integer) And breach probabilities are non-decreasing with longer horizons (e.g., P[8h] >= P[2h] >= P[1h] when configured) And horizons returned match the tenant’s configuration, defaulting to 2h, 8h, 24h if not overridden And values are deterministic for identical inputs and timestamp And the API responds with HTTP 200 and a JSON schema that validates against the published contract

Cold-Start Rules Fallback and Seamless Handoff

Given a newly created case with insufficient historical features When the prediction service processes the case Then a rules-based fallback computes and publishes ETB, breach probabilities, and risk score within 30 seconds of case creation with mode = "rules" When sufficient features become available or the ML model warms for the case Then the service switches to mode = "ml" and republishes within 5 minutes, preserving version continuity And no prediction gap exceeds 5 minutes between transitions And for identical inputs and timestamps, re-computation is idempotent (identical outputs)

Batch Prediction Latency (P95 <= 300 ms for 100 cases)

Given a batch request containing 100 active cases for a single tenant under normal operating load When the prediction endpoint is invoked Then the server-side compute latency P95 is <= 300 ms across at least 1,000 measured batches And the endpoint returns per-case results in a single response with a correlation id and generation timestamp And any partial failure returns per-case error objects without delaying successful case results And latency measurement excludes client network time and serialization on the client

High Availability and Idempotent Re-computation

Given production operation over a rolling 30-day window Then the prediction service achieves >= 99.9% availability as measured by successful requests over total valid requests And duplicate delivery of the same input (same case_id, features, and timestamp) yields byte-identical outputs (idempotent) and does not create duplicate records in storage or streams And retries are safe and side-effect free, with consistent versioning And service health and SLO metrics are exported and alert at 99.9% availability threshold breaches

Multi-Tenant Isolation and Config Scoping

Given tenants A and B with distinct data and configuration When predictions are computed for tenant A Then only tenant A’s data, models, and configuration are used; no features or labels from tenant B are accessed And cross-tenant requests are rejected with HTTP 403 or an empty result without leaking identifiers And audit logs record tenant_id on every request and data access And per-tenant horizon and model configuration are honored exactly as defined for that tenant

Open Cases Backfill on Feature Enablement

Given the feature is enabled for a tenant with N open cases When the backfill job starts Then predictions (ETB, risk score, configured horizon probabilities) are generated for 100% of open cases without manual intervention And progress is externally visible (percent complete and counts) and retried automatically for transient failures up to a configurable limit And the job is resumable; re-running produces no duplicate records and maintains idempotency for unchanged inputs And upon completion, all open cases show current predictions in queue and case header views

Calibrated Confidence Bands

"As an agent, I want a confidence range around the breach ETA so that I can set realistic expectations with customers."

Description

Provide uncertainty bounds for each ETB prediction, rendering 50/80/95% confidence intervals that are empirically calibrated. Implement post‑hoc calibration (e.g., isotonic/Platt or quantile regression) and validate coverage error within ±5% across key segments (brand, product, channel, region). Expose a model‑confidence indicator (High/Medium/Low) derived from historical error and current feature completeness; when confidence is Low, show an explicit label and widen bands. Ensure intervals and confidence update in sync with ETB refresh, and persist interval values for auditability and analytics.

Acceptance Criteria

Segmented Coverage Calibration (50/80/95)

Given a 90-day rolling holdout set with true breach times and segment attributes (brand, product, channel, region) When calibration evaluation is executed Then empirical coverage for each confidence level (50%, 80%, 95%) is within ±5 percentage points of nominal for every segment with N ≥ 200 And overall coverage across all segments is within ±3 percentage points of nominal And segments with N < 200 are marked "insufficient data" and excluded from pass/fail And a CSV/JSON report with per-segment coverage, sample size, and confidence intervals is persisted and accessible

Real-time Interval and Confidence Sync with ETB

Given a case that receives an updated ETB When the ETB refresh is emitted by the backend Then the 50/80/95 intervals and the confidence indicator update within 2 seconds p95 (≤ 5 seconds p99) in both Case Header and Queue views And the numeric values for all intervals and the confidence level are identical across Case Header and Queue views And a single "last updated" timestamp reflects the ETB refresh time And updates are atomic (no state where new ETB is shown with old intervals or vice versa)

Low Confidence Labeling and Band Widening

Given a model-confidence score computed from last-30-day coverage error and current case feature completeness When the score meets Low criteria (coverage error ≥ 7 percentage points OR feature completeness < 70% OR OOD flag = true) Then a visible "Low Confidence" label is displayed with a tooltip listing top reasons (e.g., high recent error, missing features, OOD) And the displayed interval widths are each ≥ 1.2× the base calibrated widths for that case (pre-widening) for 50%, 80%, and 95% And intervals satisfy containment: L95 ≤ L80 ≤ L50 ≤ U50 ≤ U80 ≤ U95

Persistence and Auditability of Intervals

Given any ETB prediction event for a case When the system persists prediction artifacts Then an immutable record is stored with: case_id, prediction_id, model_version, calibration_version, etb_point, ci50_low, ci50_high, ci80_low, ci80_high, ci95_low, ci95_high, confidence_level, confidence_reasons[], feature_completeness_pct, created_at And each subsequent update creates a new record without overwriting prior values And records are retrievable via an audit API and included in analytics exports And data is retained for at least 24 months

Post-hoc Calibration Pipeline and Application

Given raw model outputs used to produce ETB and intervals When the calibration training job runs Then a post-hoc calibration method is fitted and versioned (isotonic/Platt for probabilities; quantile or conformal quantile for interval endpoints) And validation enforces non-crossing/containment across intervals: L95 ≤ L80 ≤ L50 ≤ U50 ≤ U80 ≤ U95 And cross-validation metrics meet target coverage on validation within ±5 percentage points at 50/80/95 And at inference the correct calibration_version is applied and added latency from calibration is ≤ 20 ms p95 per prediction

Out-of-Distribution and New Segment Handling

Given a case in a segment unseen during calibration or flagged as out-of-distribution by drift monitoring When intervals and confidence are produced Then confidence is set to Low and reason includes "OOD" with the relevant segment keys And the interval widths are each ≥ 1.2× the base calibrated widths for that case (pre-widening) And the audit record captures segment identity and OOD flag

Top Drivers Explainability

"As an Ops Orchestrator, I want to understand the key factors driving a case’s breach risk so that I can take the right corrective actions quickly."

Description

Attach per‑case driver explanations that identify and quantify the leading contributors to breach risk and ETB (e.g., “Queue load high (+2.1h)”, “Awaiting part ETA unknown (+3.4h)”, “Customer silent 48h (+1.2h)”). Implement model‑agnostic feature attribution (e.g., SHAP) and map technical features to human‑readable labels and units. Display the top 3 drivers with directional impact and magnitude in both case header and a hover/expand detail. Refresh explanations alongside predictions, cache for performance, and log displayed drivers for model governance. Provide safeguards to avoid exposing sensitive attributes and redact tenant‑restricted fields.

Acceptance Criteria

Top 3 Drivers Visible in Header and Detail

Given a case with computed ETB attribution and at least 3 allowed drivers When the case header renders or the user opens the driver detail Then exactly 3 drivers are shown, ordered by absolute ETB impact descending And each driver shows a human-readable label, a sign (+/-), and a magnitude rounded to 0.1h with unit "h" And the values in header and detail are identical and match the backend attribution within ±0.1h

Human-readable Labels and Units

Given a case whose top drivers originate from technical features When drivers are displayed Then no raw feature keys (e.g., snake_case, IDs) appear And each driver label matches the approved mapping dictionary for the tenant And each magnitude includes an approved unit string for that label and follows rounding rules (hours to 0.1h)

Real-time Refresh and Caching SLA

Given a case ETB prediction update occurs (e.g., SLA timer tick, part ETA change) When the UI receives the new prediction Then the top drivers refresh in the header and queue within 2 seconds And P95 render time to display drivers after initial page load is ≤500 ms using the cache And attribution is recomputed at most once per case per 30 seconds unless model inputs change And no more than one attribution request is sent per case per concurrent view (debounced within 500 ms)

Governance Logging of Displayed Drivers

Given drivers are displayed to a user When the render completes Then a governance log record is written containing tenantId, caseId, modelVersion, attributionMethodId, timestamp (UTC), topDrivers [label, featureKey, contribution, unit], and maskedUserId And the record is immutable, queryable within 5 minutes of write, and retained for at least 12 months And the logged values exactly match what was displayed (within ±0.1h for magnitudes)

Sensitive Attribute Safeguards and Tenant Redaction

Given a tenant policy denylist of sensitive and restricted features is configured When computing and displaying top drivers Then no driver derived from a denied feature is displayed And denied drivers are replaced by the next highest-impact allowed drivers until up to 3 are shown And if fewer than 3 allowed drivers exist, show only the allowed count and display a "Some drivers restricted by policy" notice in the detail view And no sensitive labels, values, or feature keys appear in UI or logs

Attribution Correctness and Traceability

Given an attribution method approved list [SHAP, IntegratedGradients, Permutation] is configured When attributions are computed Then the method used is one of the approved list and its ID is included in the governance log And the signed contributions across all features sum to the model-predicted ETB delta from baseline within ±0.1h And each displayed driver magnitude equals the absolute contribution of the mapped feature within ±0.1h and the sign reflects direction of increasing ETB (+) or decreasing ETB (-)

Queue View Drivers Summary and Performance

Given a queue view of up to 1,000 cases When the queue loads with drivers enabled Then each row shows up to the top 2 driver chips (label + signed magnitude) consistent with the case header drivers And the queue view P95 time-to-interactive with driver chips is ≤2.0 s on a cold cache and ≤1.0 s on a warm cache And total attribution API calls for the queue are ≤1 per case due to batching/caching

Real‑time Case Header and Queue Widgets

"As an agent, I want breach risk to be visible and sortable directly in my queue and case header so that I can triage without switching screens."

Description

Embed ETB, confidence bands, and top drivers into the case header and queue list items with high‑signal visual design. Enable queue sorting by ETB and filtering by risk states (e.g., Safe >24h, At‑risk 4–24h, Imminent <4h). Use color states with accessibility contrast compliance (WCAG AA) and tooltips for details. Update values in real time via WebSockets/SSE with a fallback to 30‑second polling. Ensure component performance at 5,000 visible cases with virtualized lists and server‑side sorting. Provide tenant‑level configuration for default sort, thresholds, and visibility toggles.

Acceptance Criteria

Queue Sorting by ETB (Asc/Desc, Server-Side)

Given the queue contains 5,000 cases with ETB values When the user selects "Sort by ETB Ascending" Then the queue is sorted server-side by ETB ascending and the first page renders within ≤1,000 ms and the sort indicator shows "ETB ↑" Given the queue is sorted by ETB ascending When the user toggles to "Sort by ETB Descending" Then a server request with sort=etb&dir=desc is issued and results render within ≤1,000 ms and the order is strictly non-increasing by ETB Given two cases have identical ETB to the nearest minute When sorted by ETB Then ordering is stable by Case ID as a secondary key Given the tenant default sort is configured When a user loads the queue with no explicit sort Then the configured default sort is applied

Risk State Filtering (Safe, At‑risk, Imminent)

Given tenant thresholds are Safe >24h, At‑risk 4–24h, Imminent <4h When the user applies the "Imminent" filter Then only cases with ETB <4h are displayed and the filter count matches the server-reported total Given the user enables both "At‑risk" and "Imminent" filters When results load Then only cases with ETB <24h are displayed and sorting by ETB remains enabled Given a case’s ETB crosses a threshold due to a real-time update When filters are active Then the case appears or disappears from the filtered view within ≤2,000 ms without a full page reload Given filters return no cases When results load Then an empty state is displayed with a "Clear filters" action

Real‑time Updates with WebSockets/SSE and 30‑sec Polling Fallback

Given the queue view is open and network permits WebSockets/SSE When the app initializes Then a push channel connects within ≤2,000 ms and subscribes to ETB/confidence/driver updates for visible cases Given the server publishes an update for a visible case When received over the push channel Then the case row and header values update within ≤1,000 ms without full re-render of the list Given the push channel is unavailable for ≥10,000 ms When health checks fail Then the client switches to polling every 30 seconds (±5 seconds) until push connectivity is restored Given push connectivity is restored When detected Then polling stops and the push channel resumes within ≤5,000 ms without duplicate updates Given multiple updates arrive within 500 ms for the same case When rendering Then updates are coalesced so the UI applies the latest state and renders at most 5 updates/second per list

Case Header ETB, Confidence Band, and Top Drivers Display

Given a case is opened When the header renders Then it displays ETB as a relative duration (e.g., "breaches in 3h 20m") and exposes the absolute breach timestamp in a tooltip Given the model returns a confidence band and score When the header renders Then a band label (Low/Medium/High) is shown with a tooltip describing the confidence range Given top driver contributions are available When the header renders Then the top 3 drivers are shown in descending impact with tooltips including direction and magnitude of impact Given a case appears in the queue list When the row renders Then the ETB value, risk color state badge, and an info icon revealing confidence and drivers on hover/focus are visible Given data for any driver is unavailable When rendering Then the UI shows "Data unavailable" for that driver without errors

Accessible Color States and Tooltips (WCAG AA)

Given risk state badges (Safe/At‑risk/Imminent) are rendered When contrast is measured Then text and iconography meet WCAG 2.1 AA contrast ratios (≥4.5:1 for normal text, ≥3:1 for large text/icons) Given information is communicated via color When rendered Then a non-color cue (text label and/or distinct icon) is present for each state Given a keyboard-only user navigates the UI When tabbing to ETB badges and info icons Then tooltips open on focus, are dismissible with Esc, and focus order is logical without traps Given a screen reader user focuses the ETB badge When announced Then ARIA labels include ETB value, risk state, and confidence band Given automated accessibility scans (e.g., axe) run on queue and case views When executed Then there are no "serious" or higher violations attributable to the new widgets

Queue Performance at 5,000 Visible Cases with Virtualization

Given a dataset of 5,000 cases When the queue view loads on a baseline machine (4-core CPU, 8 GB RAM, latest Chrome) Then first contentful paint ≤2,000 ms and time to interactive ≤3,000 ms for the list container Given the user scrolls from top to bottom When measuring runtime performance Then average frame rate ≥50 fps and the number of mounted DOM nodes at any time ≤120 due to virtualization Given ETB updates occur for up to 1,000 cases per minute When applied Then main thread utilization averages ≤60% and no frame stalls exceed 200 ms Given a sort action is triggered under Fast 3G network conditions When server-side sorting is used Then the sorted results render within ≤1,500 ms end-to-end Given memory is profiled after 10 minutes of continuous use When measured Then JS heap usage ≤300 MB and no event listener leaks are detected

Tenant‑Level Configuration for Defaults, Thresholds, Visibility

Given an admin opens Risk ETA settings When setting a default sort and saving Then subsequent queue loads for all tenant users default to the chosen sort within ≤30 seconds Given an admin updates risk thresholds (e.g., Safe >36h, At‑risk 6–36h, Imminent <6h) When saved Then risk badges and filter logic reflect the new thresholds within ≤30 seconds without deployment Given an admin toggles visibility for ETB, confidence, or drivers When disabled and saved Then the corresponding elements are hidden in both header and queue list for all non-admin users Given configuration changes are made When saved Then an audit log entry captures actor, fields changed, old/new values, and timestamp Given a non-admin attempts to modify Risk ETA settings When action is performed Then access is denied and a clear error message is shown

Data Signals Pipeline & Quality Guardrails

"As a platform engineer, I want reliable and fresh operational signals feeding the model so that predictions remain accurate and trustworthy."

Description

Ingest and maintain the feature store powering Risk ETA, including queue load metrics, agent capacity, SLA policies, product/issue metadata, part availability and vendor ETAs, shipment tracking, communication silence windows, and historical resolution durations. Implement streaming updates where available and incremental batch elsewhere. Define schemas with versioning, freshness SLOs (e.g., <60s for queue metrics, <15m for logistics), and lineage. Add quality checks for nulls, outliers, drift, and unit consistency; auto‑fallback to defaults when signals degrade. Ensure multi‑tenant isolation, least‑privilege access, and PII compliance with redaction where not required for modeling.

Acceptance Criteria

Queue Load Freshness SLO (<60s)

Given a new claim or status-change event occurs for a tenant When the event is produced to the streaming bus Then the feature store updates the tenant’s queue_load metrics within 60 seconds for ≥99% of events over a rolling 24-hour window, and within 120 seconds for 100% of events excluding declared upstream outages. Given out-of-order or duplicated events arrive When the pipeline processes them Then the write is idempotent and state reflects the latest event-time, with no duplicate feature updates. Given freshness lag exceeds 60 seconds for 5 consecutive minutes When the monitor detects the breach Then an alert is sent to on-call with tenant, signal, current p99 lag, and incident link.

Logistics Signal Freshness SLO (<15m)

Given a change in parts availability, vendor ETA, or shipment tracking for a case When the upstream system emits an event or the batch poll executes Then corresponding features in the store update within 15 minutes for ≥99% of changes over a rolling 24-hour window, and within 30 minutes for 100%. Given an upstream outage window is declared When SLO calculations run Then the outage interval is excluded from SLO denominator and a degradation banner is recorded. Given a logistics record is late-arriving When it is ingested Then late data is correctly merged based on event-time with watermarking, without duplications.

Streaming + Incremental Batch Ingestion Guarantees

Given a source flagged as streaming-capable When events are produced under normal load Then end-to-end median latency to features is <10s and p95 <60s over a 24-hour window. Given a source flagged as batch-only When an incremental job runs Then only new/changed records since the last high-water mark are processed, the job completes within its SLA, and the high-water mark is atomically advanced. Given duplicate or retried deliveries occur When the pipeline processes them Then deduplication keys prevent double writes and outputs are exactly-once at the feature store. Given a pipeline task fails mid-run When it is retried Then processing resumes without data loss or duplication, and the run is observable in job logs with exit status.

Schema Versioning and Backward Compatibility

Given a backward-compatible schema change (additive field) When the producer publishes version N+1 Then consumers pinned to version N continue without errors and the registry records compatibility as BACKWARD. Given a breaking change (rename, type change, removal) When a PR is opened Then CI blocks deployment unless a dual-write plan, migration script, and consumer upgrade checklist are present; deployment proceeds only after backfill success and cutover approval. Given multiple schema versions are active When the store receives writes Then both versions are accepted during the dual-write window and lineage reflects versioned feature columns with start/end timestamps.

Quality Guardrails and Auto‑Fallback Behavior

Given critical fields are ingested When null rate over a 5-minute window exceeds 1% or a required field is entirely null Then the signal is marked degraded, a fallback default is applied to dependent features, and an audit event is emitted with tenant, feature, reason, and window. Given numeric features with defined unit ranges When an outlier beyond configured bounds or unit mismatch is detected Then the record is quarantined from serving, a correction rule is attempted if configured, otherwise fallback is applied and logged. Given distribution drift monitoring runs hourly When PSI ≥0.2 and <0.3 Then a warning is raised; when PSI ≥0.3 Then the signal is marked degraded, Risk ETA confidence is reduced, and model inputs switch to priors until drift clears or is rebaselined with approval.

Lineage, Observability, and Alerting

Given data is flowing from sources to the feature store When a user opens the data catalog Then end-to-end lineage is visible from source dataset to feature column with owners, transformations, and version timestamps. Given freshness monitors execute When viewing dashboards Then per-tenant, per-signal last-updated times, 30-day SLO compliance, and current lag percentiles are displayed. Given an SLO breach or guardrail degradation occurs When alerting triggers Then a single deduplicated alert reaches the correct on-call rotation within 2 minutes, with runbook links and auto-ticket creation; recovery clears the alert automatically.

Security: Multi‑Tenant Isolation and PII Compliance

Given a user or service account scoped to Tenant A with least-privilege When attempting to read or write Tenant B data Then access is denied and the attempt is logged with actor, resource, and reason. Given PII fields not required for modeling When records are ingested Then PII is redacted or tokenized before the feature store write; raw values are stored only in an approved vault, never in the feature store. Given data at rest and in transit requirements When inspecting configurations Then storage is encrypted with AES‑256, transport uses TLS 1.2+, keys are rotated per policy, and access logs retain ≥90 days with export to the SIEM. Given a data subject deletion request for a tenant When the erase job runs Then all corresponding feature records are purged within 24 hours and the lineage graph reflects the deletion completion.

SLA Policy Engine Integration

"As a compliance manager, I want breach risk to adhere to our SLA rules and business calendars so that alerts reflect true contractual exposure."

Description

Integrate Risk ETA with the SLA policy engine so breach definitions reflect tenant rules, product tiers, channels, business hours, holidays, and pause conditions (e.g., waiting on customer). Support multiple concurrent timers per case (response vs. resolution) and select the relevant timer context for Risk ETA. Apply timezone‑aware computations and handle mid‑case policy changes with re‑evaluation and audit logging. Expose APIs to retrieve the active breach threshold per case and ensure predictions reference the correct timer start/stop states.

Acceptance Criteria

Business hours, holidays, and tenant timezone threshold computation

Given tenant timezone "America/New_York", business hours Mon–Fri 09:00–17:00, holidays [2025-09-01], and a response SLA of 4 business hours And a case is created at 2025-08-29T16:30:00-04:00 with response timer starting at creation When GET /cases/{id}/sla/active?timer=response is requested at 2025-08-29T16:30:05-04:00 Then it returns thresholdAt "2025-09-02T12:30:00-04:00", timezone "America/New_York", state "running", and businessMinutesRemaining 210

Pause conditions stop timers and adjust remaining time

Given pause conditions include "Waiting on Customer" and a resolution SLA of 8 business hours with a running timer started at 2025-08-30T10:00:00-04:00 When case status changes to "Waiting on Customer" at 2025-08-30T11:15:00-04:00 Then the resolution timer state becomes "paused" within 2 seconds and lastPausedAt is 2025-08-30T11:15:00-04:00 And businessMinutesConsumed is 75 and does not increase while paused When a customer reply is received at 2025-08-30T13:00:00-04:00 Then the timer resumes within 2 seconds and businessMinutesRemaining equals (480 - 75) minutes And the Risk ETA countdown remains static while paused and resumes decreasing on resume

Multiple concurrent timers and Risk ETA context selection

Given a tenant SLA with response = 2 business hours and resolution = 24 business hours And configuration sets riskTimerContext.queue = "resolution" and riskTimerContext.caseHeader = "response" When a case is displayed in the queue view Then Risk ETA uses the resolution timer to compute ETA and labels the context as "Resolution" When the same case is opened in the case header Then Risk ETA uses the response timer and labels the context as "Response" And GET /cases/{id}/sla/active?timer=resolution and ?timer=response each return their respective thresholdAt and state

Mid-case policy change re-evaluation and audit logging

Given a case with resolution SLA = 8 business hours under policy A, business hours Mon–Fri 09:00–17:00 in "America/New_York" And the resolution timer started at 2025-08-30T10:00:00-04:00 And at 2025-08-30T12:00:00-04:00 the product tier changes to policy B with resolution SLA = 4 business hours When the policy change is saved Then the system re-evaluates within 5 seconds using 120 minutes already consumed and sets new thresholdAt to 2025-08-30T14:00:00-04:00 And an audit log entry is written with oldPolicyId, newPolicyId, oldThresholdAt, newThresholdAt, timerType "resolution", actor, occurredAt, reason "policy_change", and policyVersion And Risk ETA updates within 5 seconds to reflect the new time to breach without double-counting elapsed time

Active breach threshold API contract

Given a case with an active resolution timer When GET /cases/{id}/sla/active?timer=resolution is called Then respond 200 within 300 ms p50 and 1,000 ms p95 with JSON containing timerType, policyId, policyVersion, timezone, state, startAt, thresholdAt (ISO 8601 with offset), businessMinutesConsumed, businessMinutesRemaining, totalPausedMinutes When an invalid timer type is requested Then respond 400 with errorCode "INVALID_TIMER" When the case has no active timer for the requested type Then respond 404 with errorCode "TIMER_NOT_FOUND"

Timer start/stop state correctness for predictions

Given response timer starts on first customer message receipt and resolution timer starts on case creation per policy When a case is auto-created from an email at 2025-08-30T09:05:00-07:00 and the first customer message is at 2025-08-30T09:06:00-07:00 Then resolution.startAt = 2025-08-30T09:05:00-07:00 and response.startAt = 2025-08-30T09:06:00-07:00 And Risk ETA predictions for each context reference their respective startAt values And GET /cases/{id}/sla/active?timer=resolution and ?timer=response return matching startAt values

Daylight Saving Time and cross-timezone accuracy

Given tenant timezone "America/Los_Angeles" and business hours Mon–Fri 09:00–17:00 And a response SLA of 4 business hours And a case created at 2025-11-07T16:00:00-07:00 with no pauses across the DST fall-back on 2025-11-09 When the active threshold is computed Then thresholdAt is 2025-11-10T12:00:00-08:00 and businessMinutesConsumed equals exactly 240 minutes by the threshold When the tenant timezone is changed to "America/Chicago" at 2025-11-08T10:00:00-07:00 Then all subsequent SLA computations use "America/Chicago" and an audit entry records previousTimezone, newTimezone, actor, and occurredAt

Model Performance Monitoring & Feedback Loop

"As a product owner, I want transparent performance and a feedback loop for Risk ETA so that we can improve accuracy and roll out changes safely."

Description

Stand up continuous monitoring for ETB accuracy and calibration, including dashboards for MAE/MAPE to actual breach time, calibration curves, and segment‑level error. Implement data and concept drift detection on key features and outcomes with alerting and automated rollback to the last known good model. Capture user feedback from agents (e.g., “ETA off”, “driver incorrect”) and closed‑case outcomes to retrain models on a scheduled cadence. Support A/B canary rollouts, versioned models, and feature flags per tenant. Maintain an audit trail of predictions, intervals, and drivers for compliance and postmortems.

Acceptance Criteria

ETB Accuracy Dashboard (MAE/MAPE) by Segment and Version

Given ETB predictions and actual breach times for closed cases in the last 24 hours are available When the accuracy job runs hourly and completes successfully Then the dashboard displays MAE (hours) and MAPE (%) for ETB vs actual breach time overall and segmented by tenant, queue, product category, and model version, with a last-updated timestamp And filter selections (tenant, queue, product category, model version, date range) update metrics within 2 seconds And metrics values match an offline recomputation within ±0.5% for a sampled validation set of at least 1,000 cases And cases without actual breach times are excluded from metrics and counted in an “incomplete” tally shown on the dashboard

Calibration Curves for ETB Prediction Intervals

Given ETB predictions with 50%, 80%, and 95% prediction intervals for the past 30 days When the calibration module runs daily at 03:00 UTC Then a reliability plot and a table of empirical coverage vs nominal coverage are generated per tenant and per model version And for each interval level, the empirical coverage is within ±5 percentage points of nominal on a 10% holdout set or the dashboard flags the interval as “Miscalibrated” with a red badge And the calibration artifacts are viewable in the dashboard and exportable as CSV within 2 seconds of request And all computations and plots are stored with a run ID and model version for auditability

Automated Drift Detection, Alerting, and Rollback

Given a baseline feature and outcome distribution captured for the current “last known good” (LKG) model version per tenant When live feature PSI > 0.2 for any key feature or outcome proxy, sustained for 3 consecutive hourly windows, or mean ETB residual shifts by > 20% vs baseline Then a P1 alert is sent to on-call Slack channel and email within 2 minutes including impacted tenants, features, metrics, and links to dashboards And the serving layer automatically rolls back impacted tenants to the LKG model within 5 minutes and records the action in the audit log with timestamps and versions And subsequent predictions for those tenants reflect the LKG model version And a manual override endpoint exists to cancel or re-apply rollback with role-based access control and is logged

Agent Feedback Capture and Association to Predictions

Given an authenticated agent is viewing a case with a current ETB prediction and drivers When the agent submits feedback selecting one of: “ETA off”, “driver incorrect”, or “other”, with an optional note Then a feedback record is created with case ID, prediction ID, model version, agent ID, tenant ID, timestamp, feedback type, and note, and the API returns 201 within 500 ms And the feedback appears in the Feedback dashboard and export API within 1 minute of submission And duplicate submissions of the same type for the same prediction by the same agent within 15 minutes are deduplicated with a 409 response And feedback is joinable to training data via prediction ID for future retraining jobs

Scheduled Retraining with A/B Canary and Promotion Guardrails

Given a weekly retraining schedule set for Sunday 02:00 UTC and access to last 90 days of closed-case outcomes and agent feedback When the pipeline executes successfully Then a new candidate model is registered with a semantic version, training data snapshot ID, feature schema hash, offline MAE/MAPE, and calibration metrics And the candidate is canary-deployed via feature flags to 10% of traffic per tenant (or selected tenants) within 30 minutes of registration And guardrails enforce that online MAE does not degrade by > 5% and 80% PI coverage remains within ±5 percentage points vs control over a minimum 24-hour window before promotion And if guardrails fail, the system auto-rolls back to the previous version, raises an alert, and blocks promotion; if they pass, the model is auto-promoted and the rollout plan is logged

Compliant Audit Trail of Predictions, Intervals, and Drivers

Given any ETB prediction is served to a UI or API consumer When the prediction response is generated Then a tamper-evident audit record is stored containing case ID, tenant ID, prediction ID, model version, timestamp, ETB point estimate, 50/80/95% intervals, top 5 drivers with contribution values, and a feature snapshot hash And records are retained for 24 months, searchable by case ID, tenant, and time range, and exportable as CSV And retrieval latency for records in the last 90 days is under 2 seconds (p95), and under 10 seconds (p95) for older records And any edits or re-computations create a new revision linked to the original, preserving immutability

Smart Rebalance

Automatic load‑balancing that reassigns and re‑prioritizes at‑risk cases based on agent capacity, skills, region, and SLA severity. Supports guardrails (permissions, unions, vendor tiers) and a dry‑run approval mode. Cuts late tickets by moving work to the right owner at the right moment, with a clear audit of why changes happened.

Requirements

At-Risk Detection & SLA Scoring

"As an operations manager, I want the system to automatically flag cases at risk of missing SLAs so that we can proactively intervene before deadlines are breached."

Description

Continuously evaluates all open claims and repair tickets against brand- and product-specific SLA rules to compute a live risk score and breach ETA per case. Listens to queue events (new case, status change, pause/resume, customer reply) and recalculates in real time. Surfaces risk indicators and deadlines directly in ClaimKit’s live queue to identify cases likely to miss SLA, emitting structured events that trigger Smart Rebalance. Supports configurable pause reasons, multi-timezone handling, customer tier weighting, and exclusion windows for waiting-on-customer states. Exposes a lightweight API for downstream components to query current risk and rationale.

Acceptance Criteria

Initial Risk Score and Breach ETA on Case Creation

Given a new case is created with brand- and product-specific SLA rules available When the case is ingested via email/PDF auto-create or API Then a risk_score between 0 and 100 and a breach_eta are computed and stored within 2 seconds of creation And the applied_sla_rule_id, target_duration, and severity are persisted with the case And the risk rationale includes at minimum rule_id, time_elapsed_sec=0, time_remaining_sec, and contributing_factors[] And a single risk_calculated event is emitted with an idempotency_key derived from case_id and the create event

Real-time Recalculation on Queue Events

Given an open case with an existing risk_score and breach_eta When any of the following queue events occur: status_change, customer_reply, pause, resume, ownership_change Then the risk_score and breach_eta are recalculated using current state within 2 seconds of the event time And an audit record stores event_type, previous_score, new_score, previous_eta, new_eta, timestamp, and reason_codes[] And the live queue updates the displayed risk badge and countdown within 2 seconds to reflect the recalculated values

Pause Reasons and Waiting-on-Customer Exclusions

Given pause reasons are configurable with an exclude_from_sla flag And a case enters a paused state with a reason where exclude_from_sla=true When the case remains paused for any duration Then the SLA timer does not accrue during the paused period and the risk_score does not increase due solely to elapsed paused time And breach_eta shifts forward by the paused duration upon resume And if the pause reason has exclude_from_sla=false, elapsed time continues to count toward SLA and risk updates accordingly

Multi-Timezone-Consistent Breach ETA

Given an SLA rule specifies a timezone (e.g., America/Los_Angeles) and an agent views the case from a different timezone (e.g., Europe/Berlin) When breach_eta is calculated Then the stored breach_eta is in the SLA rule timezone with explicit offset and ISO-8601 format And the live queue displays a localized deadline in the agent’s timezone while preserving the original timezone label in the tooltip/details And no off-by-one-hour error occurs when dates cross midnight or daylight saving transitions in either timezone

Customer Tier Weighting Impacts Risk

Given customer tiers are configured with weights (e.g., Standard=1.0, Gold=1.2, VIP=1.5) And two otherwise identical cases differ only by customer tier When risk is computed at the same elapsed time and status Then risk_score for higher tiers equals min(100, base_risk_score * tier_weight) And breach_eta remains identical across tiers And the risk rationale lists the applied tier and weight

Structured Risk Events for Smart Rebalance

Given an event bus is available to publish risk updates to Smart Rebalance When a case is created or its risk_score changes by ≥5 points, crosses a severity band (Low/Medium/High/Critical), or time_to_breach becomes ≤15 minutes Then a risk_update event is published within 1 second containing case_id, risk_score, risk_band, breach_eta (ISO-8601 + timezone), rule_id, reason_codes[], previous_score, occurred_at, and idempotency_key And duplicate publications for the same logical change are prevented via idempotency_key And failed publications are retried with exponential backoff up to 3 attempts and moved to a dead-letter queue on final failure

Risk API Returns Current Risk and Rationale

Given an authenticated client requests GET /v1/cases/{case_id}/risk When the case exists Then the API responds 200 within 200 ms with risk_score, risk_band, breach_eta (ISO-8601 + timezone), last_calculated_at, rule_id, and rationale.contributing_factors[] And when the case does not exist, the API responds 404 And when the client is unauthorized, the API responds 401 without leaking existence And the response fields match the latest recalculated values and event payloads for the case

Assignment & Priority Optimizer

"As a queue supervisor, I want cases to be automatically moved to the best available owner and escalated in priority when needed so that late tickets are reduced without constant manual triage."

Description

A deterministic optimization engine that evaluates candidate owners and target priorities for at-risk cases using agent capacity, skills, certifications, region/time zone, language, and vendor tiers. Executes reassignment and priority adjustments to minimize SLA breaches and balance workload, with throttling and hysteresis to prevent flip-flopping. Supports streaming (near-real-time) and batch modes, tie-breaker rules, and schedule windows. Integrates with ClaimKit’s queue to perform idempotent reassign/priority actions and to update case metadata. Provides configurable objectives (e.g., minimize late cases, maximize first-response SLAs) and respects business calendars.

Acceptance Criteria

Deterministic Owner Selection by Skills/Capacity/Region/Language/Vendor Tier

Given a case requires skill "Compressor", certification "EPA-608", language "ES", region "US-MST", and vendor tier "Gold" and agents A, B, C with declared skills/certs/languages/regions/vendor tiers and capacity headroom When the optimizer runs in assignment mode Then it selects a single owner who satisfies all hard constraints and has the highest capacity headroom among eligible agents And if multiple agents tie, then the tie is broken deterministically by lowest current workload, then earliest next-available shift start, then ascending agent ID And the decision writes reason_code, factor_weights, and target_owner to the case audit metadata

Priority Optimization to Minimize SLA Breaches

Given objective="Minimize late cases" and first-response and resolution SLAs configured and a set of at-risk cases with predicted breach times When the optimizer runs in priority mode Then for any two cases A and B with equal severity, if A’s predicted breach time is earlier than B’s by ≥1 minute, A’s resulting queue priority is greater than or equal to B’s And priorities remain within configured bounds And the optimizer does not reduce any case’s priority if doing so increases the expected count of late cases compared to the current state And each priority change writes an audit entry with old_value, new_value, and rationale

Throttling and Hysteresis Prevent Flip-Flopping

Given throttle_window_minutes=60 and hysteresis_risk_delta=15% When a case receives a reassignment or priority change Then no further reassignment or priority change is executed for that case within 60 minutes unless the predicted SLA breach probability increases by ≥15 percentage points or the current owner becomes ineligible And any action taken within the throttle window due to an exception includes the exception reason in the audit And no case experiences more than 3 actions in any rolling 24-hour period

Streaming and Batch Modes with Schedule Windows

Given streaming mode is enabled and a capacity change event is received When the event is processed Then the optimizer publishes a decision within 5 seconds (95th percentile) and executes any allowed action within 10 seconds (95th percentile) Given batch mode is scheduled 02:00–03:00 local business day When batch runs Then only cases matching the batch filter are evaluated and changed within the window And outside configured schedule windows, the optimizer performs no write actions

Guardrails and Dry-Run Approval Mode

Given guardrails for permissions, union rules, and vendor tier limits are enabled When a proposed reassignment violates any guardrail Then the optimizer does not execute the change and records a blocked action with violated_guardrail details Given dry-run mode is active When decisions are computed Then the optimizer produces proposed actions with reason codes and idempotency keys, requests approval, and executes only those actions that receive explicit approval And actions executed from dry-run respect the same guardrails

Idempotent Queue Integration and Metadata/Audit Updates

Given idempotency keys are computed from case_id + decision_vector + target_owner/priority When the same decision is submitted multiple times Then only one reassignment/priority change is applied in ClaimKit and duplicates are acknowledged without side effects And case metadata (owner_id, priority, decision_reason, decision_timestamp, decision_hash) is updated atomically And the audit log records who, when, and why for each action And API retries do not result in duplicate actions

Workload Balancing Respects Agent Capacity

Given each agent has a declared capacity (cases or effort units) and current workload When the optimizer rebalances Then no eligible agent's workload exceeds capacity if a feasible allocation exists under all hard constraints If no feasible allocation exists Then the optimizer emits an infeasibility report and makes no changes unless override_allow_overcapacity=true And with override, overflow is assigned to the least-loaded eligible agents first and does not exceed 10% overcapacity per agent

Agent Capacity & Skills Profiles

"As a workforce planner, I want accurate, up-to-date capacity and skills data for every agent so that assignments reflect true availability and expertise."

Description

Maintains a real-time profile for each agent/vendor including skills/tags, certifications, product lines, union status, vendor tier, languages, region/time zone, shift schedule, PTO, do-not-disturb windows, max concurrency, and daily throughput targets. Ingests availability from HRIS/WFM calendars and allows manual overrides with effective dates. Exposes a performant read API to the optimizer and a secure admin UI for editing with field-level audit. Supports team hierarchies, queue membership, and temporary caps for surge events.

Acceptance Criteria

Create and Edit Agent Profile with Field-Level Audit

Given an authenticated Admin in the Admin UI, When they create an agent profile with required fields (name, agent_id, skills, languages, region/timezone, max_concurrency, daily_throughput_target), Then the profile is persisted and visible via the Read API within 5 seconds. Given an existing agent profile, When any field is edited in the Admin UI, Then an immutable field-level audit record is written including field_name, old_value, new_value, actor_id, actor_role, source="manual", timestamp (UTC), and change_reason (required) and is retrievable via audit API/UI. Given a profile change, When queried via the Read API, Then the response reflects the updated values and includes version increment and last_updated_at; historical versions remain viewable but read-only. Given audit logs, When exported for a date range up to 50,000 changes, Then a CSV export is generated within 10 seconds and includes all audit attributes.

Ingest Availability from HRIS/WFM and Reflect PTO

Given a connected HRIS/WFM calendar, When a PTO event is created for an agent with start/end in local time, Then the agent is marked unavailable for that window (stored in UTC) and the Read API reflects it within 2 minutes of the HRIS change. Given overlapping PTO and shift entries, When ingestion runs, Then PTO takes precedence (agent unavailable) and an audit entry is recorded with source="hris". Given an HRIS feed outage, When ingestion fails, Then the system retries with exponential backoff up to 6 attempts, surfaces an admin alert, and preserves last known availability without partial writes. Given a PTO cancellation in HRIS, When re-ingested, Then the unavailability window is removed and the audit trail records the reversal with source="hris".

Manual Overrides with Effective Dates and Precedence

Given an agent with HRIS-driven availability, When a Supervisor applies a manual override setting max_concurrency=0 effective [T1,T2], Then during [T1,T2] the Read API returns max_concurrency=0 and outside the window the prior value resumes automatically. Given multiple overrides on the same field with overlapping windows, When saved, Then the system enforces last-write-wins by effective_start and blocks exact-overlap duplicates unless merged explicitly. Given an override expiration, When T2 passes, Then the override is archived, no residual effects persist, and an audit record with source="manual_override" is stored. Given override creation, When reason_code is missing, Then the save is blocked and the UI displays a validation error.

Read API Performance and Filtering for Optimizer

Given a request GET /agent-profiles?team_id=...&skills=...&region=...&available_at=..., When the dataset contains 10,000 agents, Then the API responds with P95 latency ≤ 300 ms and only returns agents matching filters effective at available_at. Given pagination with page_size ≤ 500, When requesting pages, Then cursor-based pagination is supported and stable; include_total=true returns total_count with P95 ≤ 500 ms up to 100,000 agents. Given an If-None-Match header with a valid ETag, When data is unchanged, Then the API returns 304 with P95 ≤ 150 ms. Given restricted fields, When the caller lacks scope read:restricted, Then union_status and similar sensitive fields are omitted from responses; with proper scope they are included.

RBAC-Secured Admin UI Editing with Permissions

Given roles {Admin, Supervisor, Viewer}, When a Supervisor edits skills, languages, shift schedule, or queue membership, Then the changes save successfully; attempts to edit union_status or vendor_tier are blocked with 403 and explanatory messaging. Given a Viewer, When accessing the Admin UI, Then all fields are read-only and edit controls are disabled. Given server/client validation, When a user enters an invalid timezone, overlapping DND windows, or non-enumerated language codes, Then save is prevented with highlighted fields and clear error messages; the API enforces the same rules. Given an SSO session, When idle for 15 minutes, Then the session expires and unsaved edits prompt the user to confirm before logout.

Team Hierarchies, Queue Membership, and Temporary Surge Caps

Given a hierarchy (Org > Region > Team), When an agent is assigned to a child Team, Then implicit membership in ancestor Teams is reflected and returned by the Read API. Given a temporary surge cap cap_daily_throughput=50 effective [T1,T2] for Team A, When active, Then all Team A agents inherit the cap unless an agent-level cap is stricter; the cap expires automatically at T2. Given queue membership removal, When an agent is removed from Queue Q, Then the Read API omits Q within 5 seconds and the change is audit logged with actor and source. Given vendor_tier guardrails, When assigning an agent to a queue requiring tier ≥ 2, Then assignments for agents with tier < 2 are blocked with a validation error.

Timezone-Aware Shifts, PTO, and Do-Not-Disturb Windows

Given an agent in America/Chicago with shift 09:00-17:00 and DND 12:00-13:00 local, When queried with available_at=2025-09-01T17:30Z (12:30 local), Then the Read API marks the agent unavailable due to DND. Given DST transitions, When shifts span spring-forward or fall-back changes, Then effective UTC windows adjust without gaps or overlaps and availability calculations remain correct. Given filters language=es and certification=ModelX, When querying, Then only agents with both tags are returned; filtering is case-insensitive and diacritic-insensitive.

Guardrails & Compliance Enforcement

"As a compliance lead, I want rebalancing to respect contractual and regulatory guardrails so that no assignment violates policies or agreements."

Description

Enforces hard and soft constraints during rebalancing, including permissions, union contracts, territory restrictions, customer privacy limits, vendor eligibility tiers, and customer tier routing. Provides a policy language to express allow/deny rules with precedence and versioning. Generates clear reason codes when actions are blocked and offers compliant fallback paths. Supports exception requests with approver workflows and time-bounded overrides. All evaluations are deterministic and logged for audit.

Acceptance Criteria

Permissions & Territory Guardrail Blocks Unauthorized Reassignment

Given a case tagged territory=US-West and an agent lacking permission "assign.territory.us_west" And Smart Rebalance proposes reassigning the case to that agent When guardrail evaluation runs Then the reassignment is denied And the case assignee remains unchanged And a reason_code "DENY.PERMISSION.TERRITORY" is attached to the decision And an audit record is written with policy_id, policy_version, rule_id, inputs_hash, decision="deny" And an event "rebalance.blocked" is emitted with correlation_id And P95 policy evaluation latency is <= 100 ms for this decision path

Union Contract Hours & Work-Type Restrictions

Given a unionized technician with contract rules forbidding work_type="compressor" and overtime > 8h/day And a case of work_type="compressor" would push that technician beyond 8h today When Smart Rebalance evaluates potential assignments Then the technician is excluded from candidate assignment And the decision includes reason_code "DENY.UNION.CONTRACT_LIMIT" And the system selects the next compliant candidate if available and records reason_code "FALLBACK.NEXT_COMPLIANT" And the audit log captures contract_id, violated_clause, and candidate_list_before_after And no overtime assignment occurs without an approved exception

Customer Privacy: EU Data Residency Enforcement

Given a case with customer_region=EU and privacy_policy="no_cross_border_PII" And the highest-skill available agents are non-EU When Smart Rebalance evaluates routing Then cross-border assignment is denied with reason_code "DENY.PRIVACY.CROSS_BORDER" And PII fields are masked in any non-EU candidate evaluation snapshot And the system routes to the best EU-based compliant agent if one exists And if none exist, the case is routed to queue "EU_Privacy_Pending" with reason_code "FALLBACK.PRIVACY_QUEUE" And an audit record contains data_classification, candidate_regions, and masking_applied=true

Vendor Eligibility Tiers with Compliant Fallback Paths

Given a case of SLA_severity="Critical" requiring vendor_tier>=1 per policy And the current vendor is tier=2 and ineligible for Critical When Smart Rebalance computes assignment Then vendors below required tier are excluded with reason_code "DENY.VENDOR.TIER_INSUFFICIENT" And the system selects an eligible vendor within region if available And if no eligible vendor exists, the action is blocked and a fallback path "Escalate_For_Approval" is proposed with reason_code "FALLBACK.VENDOR.APPROVAL_REQUIRED" And the audit record includes vendor_candidates_ranked and selected_or_blocked outcome

Policy Precedence, Versioning, and Deterministic Decisions

Given conflicting allow and deny rules match the same case And two policy versions exist: v1.4 (Active) and v1.3 (Deprecated) When the engine evaluates at timestamp T Then the Active version v1.4 is used for evaluation And deny rules take precedence over allow rules, resulting in decision="deny" when both match And repeated evaluations with identical inputs at time T produce identical outputs and reason_code And the audit record contains effective_policy_version=v1.4 and matched_rule_ids in evaluation order

Reason Codes and Audit Logging Fidelity

Given any rebalance decision (allow or deny) When the decision is finalized Then a non-empty reason_code from the controlled vocabulary is attached And a human-readable message explaining the decision is included And an immutable audit record is persisted with fields: correlation_id, case_id, decision, reason_code, policy_id, policy_version, rule_id, evaluator_timestamp, inputs_snapshot_hash And the record is queryable via API GET /audit/decisions?correlation_id=<id> within 2 seconds of decision And audit retention is verified to be >= 7 years per configuration

Exception Requests, Approvals, and Time-Bounded Overrides

Given a decision was denied due to guardrail reason_code "DENY.UNION.CONTRACT_LIMIT" And an agent submits an exception request with requested_window_start and requested_window_end When an approver in the designated approver_group approves the request Then an override record is activated only for the approved time window And the previously denied reassignment is permitted with reason_code "OVERRIDE.APPROVED" And upon window expiry or manual revocation, the override ceases and future evaluations revert to policy-compliant decisions And all exception lifecycle events (requested, approved, denied, revoked, expired) are audited with actor, timestamp, and rationale

Dry-Run Simulation & Approval

"As a team lead, I want to preview and approve proposed rebalances so that I can control changes during rollout and high-risk periods."

Description

Offers a non-executing mode that simulates proposed reassignments and priority changes, producing a reviewable change set with expected SLA impact, capacity deltas, and affected stakeholders. Provides an approval workflow for team leads to approve, reject, or bulk-approve proposals, with optional auto-apply after timeout. Includes rollback previews, diff views per case, and scheduling to run simulations during specific windows. Exposes exportable reports for stakeholder review.

Acceptance Criteria

Dry-Run Produces Reviewable Change Set

Given Smart Rebalance is set to Dry-Run and a simulation is triggered for a defined queue scope and time window When the simulation completes Then the system generates a change set listing each proposed reassignment and priority change without applying them And for each affected case, the change set includes current owner and proposed owner, current priority and proposed priority, expected SLA impact in minutes, matched rule(s)/reason code(s), and affected stakeholders (agents, teams, vendors) And capacity deltas per agent and team are calculated and displayed with utilization percentages before and after And no live case ownership, priority, SLA timers, tags, or permissions are modified

Guardrails Honored in Simulation

Given guardrails (permissions, unions, vendor tiers, regional and contractual boundaries) are configured When a dry-run generates proposed changes Then no proposal violates any configured guardrail And proposals that would violate a guardrail are excluded from auto-apply eligibility and flagged with the violated rule and explanation And out-of-scope cases are listed with reason codes in the simulation results

Approval Workflow with Bulk Actions and Auto-Apply Timeout

Given a change set is available in the Approvals view When a user with Approver role reviews proposals Then they can approve, reject, or defer individual proposals and perform bulk actions using filters (e.g., by SLA severity, team, vendor) And the UI displays the aggregated expected SLA improvement and capacity impact for the selected proposals before confirmation When auto-apply timeout is enabled and the timeout elapses without action Then only proposals marked auto-eligible are applied to production and all others remain pending And rejected proposals are never applied and are archived with the recorded rejection rationale

Rollback Preview and Per-Case Diff

Given an approver opens the details of a proposed change When viewing the per-case diff Then the system shows before/after owner, priority, SLA timers, tags, and matched rules When the user selects Rollback Preview for approved-and-applied changes Then the system displays the exact reversal steps and expected SLA impact without performing the rollback And after application, a one-click rollback is available within the configured rollback window and reverts only the changes introduced by Smart Rebalance

Scheduled Simulations Window and Timezone Handling

Given a simulation schedule is configured with recurrence, time window, business calendar, and timezone When the scheduled time window occurs Then the system runs simulations only within the window and respects business days/holidays from the configured calendar And overlapping schedules do not double-run; at most one simulation per scope executes concurrently And pausing or disabling the schedule prevents further runs And each run records start/end timestamps, scope, rules version, outcome status, and change set size

Exportable Reports of Proposed Changes

Given a change set exists When an export is requested by an authorized user Then the system generates CSV and PDF reports containing: run ID, case ID, current owner, proposed owner, current priority, proposed priority, SLA impact (minutes), capacity deltas (owner/team), matched rules/reason codes, guardrail check result, approval state, proposer timestamp And the export respects current filters and sort with an option for full export And exports are available for download via UI and via a secured API endpoint And generated files are timestamped and stored with retention policy applied

Audit Trail for Simulations and Approvals

Given simulations and approvals occur When any simulation is generated or any proposal is approved, rejected, auto-applied, or rolled back Then an immutable audit record is created including actor, timestamp, scope, inputs (rules version, filters), change set hash, decisions taken, rationale (if provided), and resulting application status And each affected case links to its corresponding audit entry and diff And audit logs are searchable by run ID, case ID, actor, and date range and are exportable subject to access controls

Change Audit & Explainability

"As an operations analyst, I want a clear, searchable record explaining every reassignment and reprioritization so that I can trace outcomes and improve our rules."

Description

Creates an immutable ledger of recommendations and applied changes, capturing before/after owner and priority, timestamps, triggering signals, risk scores, evaluated constraints, rule/policy versions, approver identity when applicable, and correlation IDs. Provides a searchable UI with filters (time, team, reason code, rule version) and export to CSV/JSON. Generates human-readable explanations for each decision to support dispute resolution, postmortems, and continuous improvement. Retention and redaction policies are configurable for compliance.

Acceptance Criteria

Ledger entry completeness for recommendations and applied changes

Given Smart Rebalance produces a recommendation or applies a change to a case When the system writes the audit entry Then the record includes caseId, correlationId, previousOwner, newOwner, previousPriority, newPriority, changeType (recommended|applied), timestamp, triggeringSignals, riskScore, evaluatedConstraints, ruleSetVersion, reasonCode, region, teamId, outcome (approved|auto-applied|rejected|expired), approverId (if applicable) And the record is written within 200 ms of the change event And the record is retrievable by caseId and correlationId via UI and API

Immutable audit log enforcement

Given an existing audit record When any client (UI or API) attempts to update or delete the record Then the operation is rejected with 405 Method Not Allowed (update/delete) And no changes are persisted to the record And the attempt is logged with actorId, timestamp, origin (UI/API), and reason

Explainability panel content and traceability

Given a user opens the audit detail for a change When the explanation is rendered Then it includes a human-readable narrative referencing triggeringSignals, evaluatedConstraints, rule decisions, reasonCode, ruleSetVersion, and links to relevant policy documentation And it states why the change was made or not made and the expected SLA impact And it renders in <= 300 ms and respects the user’s locale and timezone And it contains no fields marked as redacted by policy

Search and filter audit UI by time, team, reason, and rule version

Given at least 100,000 audit records exist over the last 30 days When a user applies filters for time range, team, reasonCode, ruleSetVersion, owner, changeType, and approverId Then the results include only matching records and display total count and pages And the query returns in <= 2 seconds at p95 And clearing all filters restores the default view in <= 1 second

Export audit records to CSV and JSON

Given a filtered result set of N audit records When the user requests an export to CSV Then the CSV downloads with UTF-8 encoding, header row, ISO 8601 UTC timestamps, and exactly N rows And when the user requests an export to JSON Then the JSON downloads as an array with exactly N objects and the same fields as CSV And exports up to 100,000 records complete in <= 15 seconds and are access-controlled via signed URLs valid for 24 hours

Retention policy enforcement and audit

Given a retention policy of 90 days with legal hold exceptions is configured When the scheduled retention job runs Then records older than 90 days without legal hold are purged or archived per policy And purge/archival actions are logged with counts, duration, and policyVersion And records under legal hold remain accessible until the hold is lifted And purged records are no longer retrievable via UI or API

Redaction policy application across UI, API, and exports

Given a redaction policy masks emails and serial numbers after 30 days When a user views, searches, or exports records older than 30 days Then the configured fields are irreversibly masked in UI, API responses, and exports And redaction metadata on each record shows fieldsRedacted, redactionTimestamp, and policyVersion And full values are visible only to authorized roles within the allowed window; access outside policy is denied and logged

Smart Notifications & Acknowledgements

"As an agent, I want concise, timely alerts when my assignments change so that I can adjust my work without being overwhelmed by notifications."

Description

Delivers configurable notifications to impacted owners and teams when assignments or priorities change, via in-app alerts, email, and Slack. Supports batching, rate limits, quiet hours, and localization to minimize noise. Includes templated reason text sourced from explainability and deep links to affected cases. Allows agents to acknowledge or request deferral with reason, feeding updates back into capacity profiles and influencing subsequent optimization cycles.

Acceptance Criteria

Immediate Slack and Email Notification on Reassignment

Given a case is reassigned by Smart Rebalance to a new owner And the recipient has Slack and Email channels enabled When the reassignment event is committed Then send a Slack DM and an Email to the recipient within 10 seconds And the messages include templated reason text sourced from explainability with fields: previous_owner, new_owner, rule_trigger, SLA_severity, capacity_delta And the messages include a deep link to the affected case And create an in-app alert in the recipient's notification center And write an audit record with channels attempted, delivery outcomes, and timestamps

Change Digest Batching Within Time Window

Given assignment and/or priority changes occur for the same recipient within a 5-minute window And batching is enabled for the recipient or team When the number of changes N in the window is greater than or equal to 3 Then send one digest per recipient per channel summarizing all changes within 30 seconds after the window ends And include per-item case ID, reason snippet, and deep link for each case in the digest And ensure no more than 1 digest per recipient per channel per window And record the mapping of individual events to the digest ID in the audit log

Quiet Hours and Channel Rate Limits Enforcement

Given a recipient has quiet hours configured from 20:00 to 08:00 in their local time zone And a per-channel rate limit L notifications/hour is configured When a notification is generated during quiet hours Then suppress Slack and Email delivery and queue them for 08:00 And create an in-app alert immediately only if SLA_severity = Critical; otherwise queue it for 08:00 When the rate limit L would be exceeded within the current hour Then switch to digest mode for excess notifications and defer them to the next hour And log suppression, deferral, or digest decisions with reasons in the audit log

Localized Templates Per Recipient Locale

Given the recipient’s locale is fr-FR and time zone is Europe/Paris When a notification is generated Then render subject and body using the fr-FR template with localized date/time, numbers, and pluralization And include translated reason text for reason codes present in the event And if a translation is missing, fall back to en-US and emit a translation-missing telemetry event And ensure deep links remain unchanged and functional

Agent Acknowledges Reassignment From Notification

Given a reassignment notification contains Acknowledge and Open Case actions When the agent clicks Acknowledge from Slack, Email, or in-app within 24 hours Then mark the case as acknowledged with user, channel, and timestamp And stop reminder notifications related to this event And update the agent’s capacity profile by adding the case’s effort estimate to active_load And record the acknowledgment in the audit trail

Agent Deferral Request With Policy Validation

Given a reassignment notification contains a Request Deferral action and deferrals are enabled by policy When the agent submits a deferral with a reason selected from the allowed list and a defer_until time within the policy limit Then mark the case as deferred until the specified time And adjust the agent’s capacity profile and availability to reflect the deferral And if policy requires, notify the supervisor and await approval before finalizing the deferral And record the deferral reason, requested duration, policy validation outcome, approvals, and timestamps in the audit log

Dry-Run Approval Mode Notification Behavior

Given Smart Rebalance is operating in dry-run approval mode When a reassignment or priority change is proposed Then do not send notifications to individual agents And send an approval request to the approver group via in-app and email including proposed changes, reasons, and impacted owners count And upon approval, send the corresponding notifications annotated with the approval ID and approval timestamp And if rejected, send no notifications and record the decision And capture approver, decision, timestamp, and affected case count in the audit log

Escalation Ladder

Configurable, multi‑tier escalation paths that trigger before breach—notify managers, ping suppliers, open tasks, or page on‑call—without flooding inboxes. Includes throttle logic, playbook checklists, and policy‑aware timer pauses. Delivers consistent, rapid saves and airtight accountability across teams and partners.

Requirements

Multi-Tier Escalation Rules Engine

"As an ops manager, I want to configure multi-tier escalation paths with pre-breach triggers so that high-risk claims get proactive attention and SLA breaches are prevented."

Description

Provide a configurable engine to define multi-level escalation paths that trigger before SLA breach based on time thresholds, claim attributes (brand, SKU, severity, channel), and event signals (no response, part backorder, reopened case). Actions include notifying roles, reassigning queues, creating tasks, pinging suppliers via email/SMS/webhook, and posting to chat. Support reusable templates, versioning, test/simulation mode, preview of impacted claims, and safe rollout with staged environments. Ensure idempotency and per-claim state tracking to prevent duplicate actions.

Acceptance Criteria

Time-based pre-breach escalation by severity and channel

Given a high-severity claim from the Email channel with SLA due at 14:00 UTC and a rule with stages T-60 and T-15 And the SLA timer is running (not paused) When the clock reaches 13:00 UTC (T-60) Then the engine triggers Stage 1 exactly once for that claim And posts a templated message to the configured chat channel with claim ID, severity, time-to-breach, and deep link And emails the Tier 1 Lead role and records escalation_state.stage1_sent with timestamp And if the claim is resolved before 13:45 UTC, Stage 2 does not trigger And when the clock reaches 13:45 UTC (T-15) and the claim is still open, the engine triggers Stage 2 exactly once and creates a follow-up task assigned to the Ops Manager queue

Event-triggered escalation on no agent response

Given a rule that escalates if no agent note or customer reply is recorded for 24 hours after assignment And a claim is assigned at 09:00 local time When 24 hours elapse with no qualifying activity Then the claim is reassigned to the "Escalations" queue And the current assignee and escalation manager are notified via email and chat with the claim link and inactivity duration And escalation_state.no_response_fired is set with a correlation_id And subsequent evaluations do not re-fire unless state is reset or the claim receives a response and then becomes inactive again for another 24 hours

Supplier notification actions via email/SMS/webhook

Given Stage 2 defines supplier ping actions for supplier ABC with email, SMS, and webhook templates When Stage 2 fires for a claim linked to supplier ABC Then the system sends exactly one email, one SMS, and one HTTP POST to the configured endpoint, each populated with claim ID, SKU, serial, SLA_due_at, and parts_status And delivery outcomes are logged; failed deliveries are retried with exponential backoff up to 3 attempts per channel And duplicate deliveries are prevented by idempotency keys scoped to claim_id + stage + rule_version

Reusable rule templates with versioning and audit history

Given a Rule Template "High Severity Ladder" v1 exists When a user creates v2 with modified thresholds and publishes it Then v1 remains immutable and selectable; v2 receives a new version ID, changelog notes, author, and created_at And existing rule instances continue using their pinned version until explicitly upgraded And upgrading an instance updates effective_version and writes an audit log entry with before/after values

Simulation mode with preview of impacted claims

Given a draft rule is set to Simulation mode When a user selects a date range and runs a simulation Then the system produces a report listing the count and IDs of claims that would match per stage and the actions that would have fired And no notifications, task creations, queue reassignments, or webhooks are executed And the report is downloadable (CSV/JSON) and stored with a simulation_id and timestamp for 30 days

Safe rollout via staged environments and cohort gating

Given environments Staging and Production are configured When a rule is published to Staging, validated, and then promoted to Production with a 25% cohort gate (brand = Acme) Then only Production claims matching brand = Acme are evaluated by the new rule version; others continue on the prior version And operators can rollback to the previous version with one action; rollback takes effect within 2 minutes and is audit logged with actor and reason

Idempotent execution and per-claim escalation state tracking

Given the engine evaluates rules every 5 minutes across multiple workers When the same claim qualifies for the same stage of the same rule version across successive evaluations Then the engine does not re-execute actions; per-claim state records stage, version, action_ids, and timestamp And concurrent workers do not cause duplicates due to a distributed lock or idempotency keys And if the claim status transitions to Closed then Reopened, the escalation state resets per policy, allowing stages to fire again for the new lifecycle

Throttled Notifications & Digesting

"As a team lead, I want escalations to throttle and bundle notifications so that my team stays informed without inbox overload."

Description

Implement throttle logic that limits escalation notifications per claim, per user, and per channel, with configurable cooldowns and quiet hours. Provide bundling into periodic digests, deduplication across channels, acknowledgement-to-suppress behavior, and escalation handoff rules to avoid alert storms. Respect user channel preferences (email, chat, SMS) and working hours, with fallback routing when delivery fails. Log all notification events for auditability.

Acceptance Criteria

Throttle Per Claim/User/Channel with Cooldowns and Quiet Hours

Given configured cooldowns claim=30m, user=15m, channel=10m and quiet hours 22:00-07:00 When 3 escalation events for the same claim to the same user on the same channel occur within 10 minutes during working hours Then only the first notification is sent on that channel and the next 2 are suppressed until the 30-minute claim cooldown elapses with suppression reason "claim-cooldown-active" When 2 different claims attempt to notify the same user on the same channel within 5 minutes Then the second notification is suppressed until the 15-minute user cooldown elapses with suppression reason "user-cooldown-active" When an escalation event occurs at 22:30 Then no immediate notification is sent and the notification is deferred to 07:00 with audit reason "quiet-hours-defer"

Periodic Digest Bundling

Given a user has an email digest window of 30 minutes and a minimum bundle size of 3 When 7 suppressed or deferred escalation items accumulate for that user during the window Then exactly 1 digest email is sent at window end summarizing those 7 items with claim IDs, severities, and counts and no individual alerts for those items are sent When the window ends with fewer than 3 items Then no digest is sent and items remain queued for the next window or until threshold is met When any item in the pending digest is acknowledged before the window closes Then the acknowledged item is removed from the digest and is not included in the sent summary

Cross-Channel Deduplication

Given a user's channel preference order is Chat > Email > SMS and cross-channel dedupe window is 60 seconds When the same escalation event triggers notifications on Chat and Email within the dedupe window Then only the Chat notification is sent and the Email notification is suppressed with reason "cross-channel-dedupe" and a reference to the Chat delivery ID When two identical escalation events with identical hashes arrive within 60 seconds from different sources Then only one notification is sent on the preferred channel and the duplicate is suppressed with reason "duplicate-event"

Acknowledgement-to-Suppress Behavior

Given a user receives an escalation notification for claim C123 and clicks Acknowledge within the message When the acknowledgement is recorded Then further notifications for claim C123 to that user are suppressed for 2 hours with reason "ack-suppress" unless the claim transitions to a higher severity or a new escalation stage When the claim escalates to a higher severity during the suppression window Then a single notification is sent immediately and the suppression window is reset from the time of the new notification

Escalation Handoff Storm Prevention

Given a 3-tier escalation policy with T1 timeout 15 minutes and T2 timeout 30 minutes When T1 has not acknowledged within 15 minutes Then exactly 1 notification is sent to T2 and no further T1 notifications are sent for that stage When T2 is notified and later T1 acknowledges within 5 minutes of T2 notification Then no additional notifications are sent to T2 or T3 for that stage and queued alerts for T2/T3 are canceled with reason "handoff-canceled" When ownership of the claim is reassigned to a supplier Then internal tiers stop receiving notifications and only the supplier tier receives subsequent escalations

Channel Preferences, Working Hours, and Fallback Routing

Given a user preference: Email allowed 09:00-17:00, Chat disabled, SMS allowed as fallback and a policy "Override quiet hours for Priority=Critical" is false When a Priority=High escalation occurs at 18:30 Then no Email or Chat is sent and the notification is deferred to 09:00 next business day When a Priority=Critical escalation occurs at 18:30 Then an SMS is sent immediately as allowed fallback and Email remains deferred When a preferred channel delivery returns a permanent failure (e.g., SMTP 550) or 3 transient failures within 2 minutes Then the system routes to the next allowed channel within 2 minutes and logs the fallback with correlation to the failed attempt and no duplicate is sent if the original later succeeds

Comprehensive Notification Audit Logging

Given notification processing occurs for any event When a notification is sent, suppressed, deferred, digested, acknowledged, or failed Then an immutable audit record is created containing timestamp (UTC), claim ID, rule ID, user ID, channel, action (sent/suppressed/deferred/digested/ack/fail), reason code, correlation/event hash, delivery ID, and actor/system IDs When an auditor filters logs by date range, claim ID, user ID, channel, or reason code Then matching records are returned within 2 seconds for up to 100k records and include links from digest entries to their constituent items When exports are requested for a 30-day period Then a CSV and JSON export is generated within 60 seconds and retained for 24 hours for download

Policy-Aware SLA Timer Pauses

"As a compliance-focused support lead, I want SLA timers to pause and resume based on approved policy states so that reporting is accurate and we aren’t penalized for waiting on customers or parts."

Description

Integrate escalation logic with the SLA engine to automatically pause and resume timers when cases enter policy-defined states such as Awaiting Customer, Awaiting Parts, or Supplier Review. Require approvals for certain pauses, capture reasons and evidence, and write full audit logs. Support time zones, regional holiday calendars, and per-queue working hours to ensure accurate remaining-time calculations and breach prediction. Expose pause/resume events to reporting and webhooks.

Acceptance Criteria

Auto-Pause on Awaiting Customer

Given a case in a policy-enabled queue with "Pause on Awaiting Customer" enabled and no approval required When the case state changes to "Awaiting Customer" via agent action or automated rule Then the SLA timer pauses immediately at the state-change timestamp And the pause reason is recorded as "Awaiting Customer" And at least one outbound customer message artifact (email/message ID) is attached; otherwise the pause is blocked with a validation error And the case header shows "Paused" with remaining time in hh:mm computed against the queue's working hours

Auto-Resume on Customer Reply and Recalculation

Given a case paused with reason "Awaiting Customer" When a reply from the case contact is ingested via magic inbox or portal Then the SLA timer resumes at the ingest timestamp normalized to the queue time zone And the remaining time equals the value at the moment of pause (no working time consumed during pause) And breach prediction updates within 5 seconds And duplicate replies do not create duplicate resume events And an audit entry of type "resume" is created with detector source and message ID

Supplier Review Pause With Approval and Evidence

Given a policy that requires "Escalations Manager" approval for pauses with reason "Supplier Review" When a user attempts to move the case to "Supplier Review" Then an approval request is created and the SLA timer continues running until approval And if approved, the timer pauses at the approval timestamp, and a supplier ticket/reference ID must be provided as evidence And if rejected, the state reverts to the previous value and the timer remains running And if not acted on within 8 business hours, the approval request times out, the state auto-reverts, and the requester is notified

Working Hours and Regional Holidays Applied to SLA

Given a queue configured with working hours 09:00–17:00 and holiday calendar "US-CA" When a case runs across non-working hours or holidays while not paused Then SLA remaining time decreases only during configured working hours and excludes "US-CA" holidays And when a pause spans multiple days, the remaining time on resume equals the pre-pause value, and future breach prediction excludes non-working periods And the SLA banner indicates the next working start if a predicted breach falls in non-working time

Time Zone Normalization and DST Safety

Given a queue time zone of America/Los_Angeles and an agent viewing from Europe/Berlin When a pause occurs at 16:55 PT and resumes at 09:05 PT the next business day, spanning a DST transition if applicable Then all audit timestamps are stored in UTC, displayed in the agent's local time, and SLA math uses the queue time zone And no negative durations or double-counted minutes occur across the DST boundary And breach prediction error is within ±1 minute of the theoretical schedule

Audit Logs and Webhooks for Pause/Resume

Given any pause or resume event on an SLA-tracked case When the event is committed Then an immutable audit record is written containing: case ID, actor (user/service), reason, evidence refs, old/new state, event type (pause/resume), queue ID, timestamps (UTC and local), remaining SLA seconds before/after, approval ID (if any) And the event is available in reporting datasets within 2 minutes And a webhook with topic slas.timer.paused or slas.timer.resumed is delivered within 30 seconds with an idempotency key and signature And webhook retries use exponential backoff for up to 24 hours on non-2xx responses

Playbook Checklists & Auto-Tasks

"As a support agent, I want actionable checklists to auto-open at each escalation tier so that I can follow the correct steps quickly and consistently."

Description

Attach per-tier, role-based playbook checklists that open automatically on escalation, creating assignable tasks with owners, due times, and dependencies. Include step templates by claim type/brand, inline guidance, and links to knowledge articles. Track completion, require sign-off for gated steps, and block promotion to the next tier until required tasks are done or explicitly waived with justification. Synchronize tasks with the main ClaimKit queue and expose progress to stakeholders.

Acceptance Criteria

Auto-Open Role-Based Playbook on Tier Escalation

Given a claim with type "Appliance-Install" and brand "Acme" escalates to Tier 2 When the escalation event is processed Then the Tier 2 playbook checklist is attached within 5 seconds And tasks are auto-created per the template with owner roles resolved to active users And each task has a due time computed from the escalation timestamp plus the task’s SLA offset And declared task dependencies are enforced so dependent tasks are locked until prerequisites are complete And reprocessing the same escalation does not create duplicate tasks (idempotent) And an audit entry records checklist ID, task IDs, owners, due times, and dependency graph

Template Resolution by Claim Type and Brand with Fallback

Given templates exist for Tier 2 with specificity (claim type + brand) and a Tier 2 default template When an "Appliance-Install/Acme" claim escalates to Tier 2 Then the matching specific template is applied When an "Appliance-Install/UnknownBrand" claim escalates to Tier 2 Then the Tier 2 default template is applied When no Tier 2 default template exists Then the system surfaces an error to ops admins and does not create tasks And the applied template version ID is stamped on the checklist and remains immutable after creation

Inline Guidance and Knowledge Article Links

Given a user opens a task generated from a playbook When the task detail panel is rendered Then inline guidance text specific to the step template is displayed And at least one knowledge article link is shown when configured And clicking a link opens the article in a new tab and returns HTTP 200 within 3 seconds And if the link is unreachable, an inline warning is shown without blocking task execution

Gated Steps, Sign-Off, and Promotion Blocking

Given one or more tasks in the checklist are marked as "Gated" with required role(s) When a user attempts to complete a gated task Then a sign-off control is required and only users with one of the required roles can sign And the sign-off captures user, role, timestamp, and optional notes When a user attempts to promote the claim to the next tier while any required tasks are neither completed nor waived Then promotion is blocked and a clear error message lists the blocking tasks When all required tasks are completed or validly waived Then promotion to the next tier is enabled immediately

Waiver Authorization and Justification Audit

Given a required task permits waiver and has an allowed roles list When a user without an allowed role attempts to waive Then the waive action is not available When an authorized user chooses to waive a required task Then a justification modal is presented requiring a reason code and at least 15 characters of free-text justification And upon confirmation the task status becomes "Waived" with user, timestamp, reason code, and justification recorded in the audit log And the waived task is counted as satisfied for promotion checks while remaining visible in the checklist When a waiver is rescinded by an authorized user Then the task returns to its prior actionable state and the audit log records the reversal

Two-Way Sync with ClaimKit Main Queue

Given tasks are created from an escalation checklist When viewing the main ClaimKit queue Then each task is visible as a child item of its parent claim with status, owner, and due time fields When status, owner, or due time is updated in either the task panel or the queue Then the change is reflected in the other view within 5 seconds And the parent claim shows a badge with completed/total task counts updated within 5 seconds And clicking a task from the queue deep-links to the task detail panel

Progress Visibility to Stakeholders

Given a claim has an active escalation checklist When a stakeholder (internal or partner with read-only access) views the claim Then a progress module shows percent complete, counts of completed/remaining/gated/waived tasks, and next due task with its due time And the module updates within 5 seconds of any task state change And external stakeholder views exclude restricted fields and PII per role settings And the view displays a "Last updated" timestamp and reflects the current checklist version applied to the claim

On-Call & Supplier Paging Integration

"As an escalation manager, I want on-call engineers and suppliers paged through their preferred channels with failover so that critical issues are addressed immediately."

Description

Integrate with on-call scheduling and incident platforms (e.g., PagerDuty, Opsgenie) and supplier contact endpoints (email, API, SMS) to route escalations to the correct party at the correct time. Support contact windows, retries with backoff, failover targets, and confirmation/ack workflows. Securely store supplier contact methods, use webhook signing and OAuth where applicable, and record delivery outcomes. Allow per-supplier SLAs and response expectations to drive subsequent ladder steps.

Acceptance Criteria

Page On-Call via PagerDuty Before SLA Breach

Given a case with SLA due at time T and an escalation rule configured to page PagerDuty at T-15 minutes within the supplier’s contact window in the supplier’s timezone When the system evaluates the ladder at T-15 minutes Then it resolves the current on-call user for the mapped PagerDuty service via API and creates a high-urgency incident including caseId, customerName, priority, and slaDueAt When PagerDuty responds with HTTP 2xx and an incident ID Then the system records a delivery attempt with provider=PagerDuty, status=Delivered, incidentId, httpStatus, latencyMs, and timestamp When PagerDuty responds with HTTP 4xx/5xx or times out Then the system retries up to 3 times with exponential backoff of 1m, 2m, 4m with ±20% jitter and logs each attempt When max retries are exhausted without success Then the system marks status=Failed for this channel and immediately triggers the configured failover target

Opsgenie Routing with Contact Windows and Local Time Enforcement

Given a supplier contact window of 09:00–17:00 America/New_York and a case entering escalation at 16:55 ET When the system evaluates the step Then it resolves the Opsgenie on-call recipient for the mapped team and sends a page before 17:00 ET Given the same case evaluated at 17:05 ET and the step is configured to respect contact windows When the ladder evaluates Then no page is sent, the step is deferred to 09:00 ET next business day, and the SLA timer is paused if policy=PauseOutsideWindow When policy=DoNotPauseOutsideWindow is configured Then the SLA timer continues and the next evaluation is scheduled per configuration Then duplicate pages to the same recipient for the same case are suppressed for a minimum interval of 10 minutes

Supplier Multi-Channel Failover with Retries and Backoff

Given a supplier with primary API endpoint, secondary Email, and tertiary SMS contact methods and an escalation step configured with 2 retries per channel When the primary API returns non-2xx or exceeds a 10s timeout Then the system retries the API twice with backoff delays of 30s and 60s and records each attempt When the API still fails after retries Then the system fails over to Email and sends a message including case summary, unique confirmation link/token, and SLA deadline; it records SMTP/messageId and HTTP status When Email fails (bounce/4xx/5xx) or no delivery outcome is received within 2 minutes Then the system sends an SMS to the stored number via the configured provider, records messageId and delivery status, and stops further attempts as soon as any channel succeeds

Acknowledgment Workflow Stops Escalation and Captures Response Time

Given an escalation notification includes a confirmation action (PagerDuty acknowledge, Opsgenie ack, Email link, or SMS reply ACK) When the recipient acknowledges within the configured response expectation (e.g., 15 minutes) Then retries for this step stop, the escalation state becomes Acknowledged, and responseTimeMinutes is recorded on the case and supplier metrics When an acknowledgment arrives after the expectation window but before the max escalation time Then the system records lateAck=true and follows the configured policy (continue next step or halt) When no acknowledgment is received within the expectation window Then the system triggers the next ladder step and appends a NoAck reason to the escalation audit trail Then all acknowledgments must be cryptographically verifiable (valid provider webhook signature or valid unexpired token); invalid acks are rejected and logged without changing state

Secure Credential Storage, OAuth, and Webhook Signing

Given supplier credentials (API keys, OAuth tokens, email passwords, SMS tokens) Then they are stored encrypted at rest using KMS-managed encryption, access is restricted to the escalation service, and all access is audited with user/service identity and timestamp When integrating with PagerDuty/Opsgenie via OAuth 2.0 Then the system completes authorization code flow, securely stores refresh tokens, refreshes access tokens before expiry, and revokes all tokens upon supplier disconnect When receiving inbound webhooks (acknowledgments or delivery updates) Then the system verifies provider signatures (e.g., HMAC) against stored secrets, rejects requests with invalid/missing signatures with HTTP 401, and performs no state mutation When sending outbound API/webhook requests Then requests include required auth/signing headers, and secrets can be rotated without downtime with successful requests signed using the new secret

Delivery Outcome Recording and Visibility

Given any escalation delivery attempt Then the system writes an immutable record containing caseId, stepId, channel, target, provider, requestId/messageId, httpStatus/deliveryStatus, latencyMs, attemptNumber, outcome, and timestamp When a user views the case activity in the UI or queries the API Then all attempts and outcomes are shown in chronological order, filterable by channel/provider, with export available to CSV and JSON When a delivery provider posts a status update webhook Then the corresponding delivery record is updated to final status within 10 seconds and the case timeline reflects the change

Per-Supplier SLAs and Response Expectations Drive Ladder Steps

Given a supplier with response SLA of 30 minutes and resolve SLA of 2 business days When a case escalates to that supplier Then a response timer starts, pauses during configured policy states (Awaiting Customer, Outside Contact Window if policy=Pause), and the next step is evaluated when the timer reaches 30 minutes without acknowledgment When the supplier acknowledges before 30 minutes Then the next ladder step is canceled or rescheduled per policy=OnAck:Cancel and the resolve SLA timer continues independently When the resolve SLA enters a pre-breach threshold (e.g., 4 hours remaining) without status update Then pre-breach escalations are triggered per rule unless policy=SuppressAfterAck is enabled

Escalation Analytics & Accountability

"As an operations director, I want dashboards and audit trails of escalations so that I can measure impact, enforce accountability, and optimize policies."

Description

Deliver dashboards and exports that show pre-breach saves, time-to-acknowledge, time-in-tier, MTTR deltas, top triggers, and supplier response performance. Provide per-queue and per-agent views, cohort analysis by claim type, and ladder effectiveness comparisons across versions. Include a full audit trail of notifications, acknowledgements, pauses, task completions, and configuration changes to support compliance and postmortems.

Acceptance Criteria

Pre‑Breach Saves & Time Metrics Dashboard

Given a user selects a date range and one or more queues When the dashboard loads Then it displays: count and rate (%) of pre‑breach saves, median and p90 time‑to‑acknowledge, median and p90 time‑in‑tier by tier, MTTR, and MTTR delta vs previous comparable period And all metrics exclude durations during policy‑aware timer pauses And data freshness is under 5 minutes (difference between now and latest event ingested <= 5 minutes) And each metric supports drill‑down to the underlying claim list And empty states render with “No data” and zeroed metrics when no claims match filters

Per‑Queue and Per‑Agent Performance Views

Given a user toggles between Queue and Agent views with filters for date range, claim type, and supplier When the view is changed Then leaderboards render with sortable columns for Save Rate, Time‑to‑Ack (median), Time‑in‑Tier 1 (median), MTTR, and Escalations per Claim And selecting an agent or queue opens a detail page with trend charts and case drill‑downs And with up to 100k claims in range, initial load p95 < 3s and sort p95 < 2s And results can be exported respecting the active filters

Cohort Analysis by Claim Type

Given a user selects up to 10 claim type cohorts and a comparison period When the analysis runs Then a table shows each cohort’s Save Rate, Time‑to‑Ack median, Time‑in‑Tier 1 median, MTTR, and volume And a delta column shows absolute and % change vs baseline period And totals across cohorts reconcile to overall totals within 1% rounding tolerance And the cohort view supports CSV export with all displayed fields

Ladder Effectiveness Comparison by Version

Given multiple versions of the escalation ladder were active during the selected period When the user compares Version A vs Version B (optionally controlling for claim type and supplier) Then the report attributes each case to the ladder version in effect at the time of its first escalation event using configuration history And it displays Save Rate, MTTR, MTTR delta vs prior version, Escalations per Case, and Time‑to‑Ack by tier for each version And significance badges appear when differences meet p<0.05 (two‑proportion z for rates, Mann‑Whitney U for medians) And drill‑down lists are pre‑filtered by version

Supplier Response Performance

Given a user filters by supplier and tier When the supplier performance report loads Then it shows for each supplier: median and p90 acknowledgment time to escalation notifications, acknowledgment rate within SLA (%), average number of touches to resolution, and Save Rate And time stamps are normalized to UTC and display in the user’s local timezone And underperformers (ack rate within SLA < 90% or median ack time > SLA) are highlighted And data can be exported with one row per supplier per period

Analytics Export & API Access

Given a user with Export permission applies any combination of filters When they request a CSV or JSON export Then the file streams with a documented schema including claim_id, queue_id, agent_id, supplier_id, claim_type, timestamps (ISO‑8601 UTC), metrics fields, and version identifiers And exports up to 1,000,000 rows complete with p95 duration < 60s And an authenticated REST endpoint provides the same dataset with cursor pagination and rate limiting And exported data respects role‑based access and excludes redacted PII fields

End‑to‑End Audit Trail Completeness & Immutability

Given an auditor opens a claim’s audit timeline When reviewing events Then the trail contains notifications sent (channel, recipient), acknowledgments (actor, method), throttle suppressions, timer start/pause/resume/stop (reason), playbook task check/complete, supplier responses, and configuration changes (before/after, actor) And each event has an immutable ID, actor (user/system), correlation IDs, and millisecond timestamps And the trail is tamper‑evident via a forward hash chain; any modification invalidates the chain and is flagged And the trail is searchable/filterable and exportable, with retention >= 7 years

Configuration UI & RBAC Controls

"As a platform admin, I want a safe, permissioned UI to design, test, and version escalation ladders so that changes are controlled and can be deployed confidently."

Description

Offer a visual builder to design, validate, and version escalation ladders with drag-and-drop steps, conditional branches, and action blocks. Provide staging vs production environments, change reviews, and approval workflows. Enforce role-based permissions for viewing, editing, publishing, and emergency overrides. Include dependency checks, linting for unsafe patterns (e.g., alert loops), sample-claim test harness, and one-click rollback to prior versions.

Acceptance Criteria

Visual Builder: Drag-and-Drop Ladder Authoring

Given I am an Editor with access to staging When I create a new ladder and drag 3 steps, 1 conditional branch, and 2 action blocks onto the canvas Then the canvas shows all nodes with unique IDs and valid connections And the Save action is enabled And the serialized config validates against schema version 1.x with zero errors Given a node is dropped onto an invalid connector When I release the drag Then the drop is rejected with inline error "Invalid connection" And the Save action remains disabled until all errors are resolved Given I modify a node label and parameters When I click Save Then the change is persisted, time-stamped, and appears in the change diff view

Staging-to-Production Publishing and Versioning

Given a ladder exists in staging with no lint errors and all reviews approved When I publish to production Then a new production version is created with semantic version increment and immutable checksum And production traffic routes new cases to this version within 60 seconds And the previous production version remains available for rollback Given a ladder in staging with pending required reviews When I attempt to publish Then the publish action is blocked with reason "Pending approvals" Given a ladder in staging with lint severity error When I attempt to publish Then the publish action is blocked and shows the failing checks list

RBAC Permissions: View/Edit/Publish/Override Enforcement

Given a Viewer role user When accessing the builder Then they can view configurations but cannot edit, submit for review, publish, or override Given an Editor role user When editing in staging Then they can create and modify ladders and submit for review but cannot publish to production Given a Publisher role user with approvals met When attempting to publish Then the publish action succeeds and is audited with user, timestamp, and version Given an Emergency Override role user When initiating an override Then they can pause a ladder or bypass a step for a defined duration with mandatory reason and it is logged and notified to reviewers Given an unauthorized user When attempting restricted actions Then the action is denied with 403 and an audit entry is created

Linting and Dependency Checks on Save/Publish

Given a ladder contains a potential alert loop or circular dependency When I run Validate or Save Then the linter flags the issue with severity "error", node references, and remediation tips, and Save is blocked Given a ladder has unreachable branches or missing recipients When I run Validate Then the linter flags issues with severity "warning" and Save remains allowed but Publish is blocked if warnings exceed policy threshold Given all checks pass When I run Validate Then the validation report shows zero errors and zero policy-blocking warnings within 3 seconds

Sample-Claim Test Harness and Playback

Given a sample claim with defined attributes and time-of-day When I execute the ladder in dry-run mode Then the harness displays the ordered action trace, timer starts/pauses, throttling decisions, and recipient list deterministically Given the harness is run with the same inputs When executed repeatedly Then outputs are identical and time-anchored steps simulate using a controllable clock Given a failing step in the dry run When assertions are set (e.g., expected notification count) Then the test fails with clear diff and blocks publish until resolved or the test is deselected by an authorized user

Approval Workflow and Review Gates

Given an Editor submits a ladder for review When reviewers are assigned per policy (min 2, not including author) Then reviewers can comment, request changes, or approve; all actions are timestamped and audited Given the required number of approvals is met and no blocking checks remain When the Publisher attempts to publish Then publish is enabled; otherwise it remains disabled with a checklist of unmet gates Given a reviewer requests changes When the Editor updates the ladder Then previous approvals are invalidated and the review cycle restarts

One-Click Rollback from Production

Given a production ladder version N and a previous version N-1 When a Publisher or Emergency Override user clicks Rollback to N-1 and confirms Then production traffic is routed to N-1 within 60 seconds, and N is marked as withdrawn, with an audit log entry and notifications sent Given rollback is executed When new cases arrive Then they use version N-1, while in-flight cases continue using their pinned version, as indicated in case metadata Given rollback When I open the version history Then I can see who performed the rollback, the reason, timestamps, and a link to compare diffs between N and N-1

Heatmap Drilldowns

A live SLA‑risk heatmap by queue, channel, product, region, and partner with a 24–72h forecast. Click to drill from hotspots to individual cases and annotate incidents for context. Gives leaders instant situational awareness to redeploy staff, reprioritize, and brief execs confidently.

Requirements

Real-Time SLA Risk Heatmap

"As an operations leader, I want a live heatmap of SLA risk by queue, channel, product, region, and partner so that I can spot hotspots and act before breaches."

Description

Compute and render a live, color‑coded heatmap of SLA risk across key dimensions (queue, channel, product, region, partner) using current case states and ClaimKit’s SLA timers. Aggregate risk scores per segment based on time-to-breach, backlog size, and breach probability; update continuously as new claims arrive via Magic Inbox and as case statuses change. Provide interactive filters (time window, brand, partner, SLA tier), legend, and accessibility-friendly color palette. Target performance: <2s render for up to 100k active cases, <60s data freshness. Expose an internal API for the UI and for scheduled exports. Integrates with existing eligibility checks and SLA definitions to ensure consistency across dashboards and alerts. Outcome: leaders gain instantaneous situational awareness of where SLAs are at risk.

Acceptance Criteria

Leaders Monitor Live SLA Risk by Segment

Given active cases exist across queues, channels, products, regions, and partners with SLA timers running And SLA definitions are loaded from the shared configuration used elsewhere in ClaimKit When the heatmap loads with default filters Then each segment tile displays a risk score computed as a weighted function of time-to-breach, backlog size, and breach probability per specification And tiles are color-coded according to the legend thresholds for low/medium/high/critical risk And segment totals and risk scores equal the aggregated values from the reference aggregation query within ±1% And the heatmap displays a data freshness timestamp derived from the latest aggregation run

Heatmap Performance at 100k Active Cases

Given a dataset of 100,000 active cases distributed across all dimensions When a user opens the heatmap Then initial render completes in under 2 seconds (P95) and under 3 seconds (P99) And subsequent refreshes complete in under 1 second (P95) And interaction (scroll, hover, filter open) remains responsive with input latency under 100 ms (P95) during and after render And the displayed data freshness (now − data_timestamp) is ≤ 60 seconds (P95) under sustained ingest of 1,000 new/updated cases per minute

Real-Time Updates from Magic Inbox and Case Changes

Given Magic Inbox auto-creates a new eligible claim in Channel=Email, Region=US-East, SLA tier=Gold When the claim is created and assigned Then the affected segment counts and risk score update on the heatmap within 60 seconds of creation And the case appears in the drilldown list for that segment within 60 seconds Given an existing case status changes from Open to Resolved When the status update is saved Then the case is removed from applicable segment counts and risk aggregation within 60 seconds And the case’s SLA timers no longer contribute to breach probability

Accessible Color Palette, Legend, and Interactive Filters

Given the heatmap is visible When the user adjusts filters for time window (Next 24h), brand (e.g., Acme), partner (e.g., RepairCo), and SLA tier (Gold), individually and in combination Then the heatmap updates to reflect the intersection of selected filters within 1 second (P95) And a legend clearly displays color thresholds and numeric risk ranges used for tiles And tile colors and legend text meet WCAG 2.1 contrast ratio ≥ 4.5:1; distinct risk levels remain distinguishable under deuteranopia/protanopia/tritanopia simulation And each tile shows a numeric risk value or badge so risk is conveyed non-color redundantly And all filters and tiles are keyboard-navigable and have ARIA labels describing segment, counts, and risk

Drilldown to Cases and Incident Annotations

Given a user clicks a hotspot tile with elevated risk When the drilldown opens Then a segment breakdown view appears with a sortable case list default-sorted by time-to-breach ascending And selecting a case opens case details in a panel or new tab without losing context And the user can add an incident annotation with title, description, affected segment, and optional tags And the annotation persists with user ID and timestamp and appears in the segment tooltip and drilldown within 5 seconds of save And only users with Annotate permission can create/edit/delete annotations; others can view only And annotations are included in API responses and scheduled exports for the associated segment

24–72 Hour SLA Breach Forecast Availability

Given forecast mode is enabled When the user selects a 24h, 48h, or 72h horizon Then the heatmap displays predicted at-risk counts and risk scores per segment based on current SLA timers, backlog aging, and breach probabilities And forecast outputs for a reference dataset match the baseline model within MAPE ≤ 5% And computing and rendering the forecast completes in under 2 seconds (P95) for 100,000 active cases And the legend indicates that values are forecasted and shows the horizon selected

Internal API and Scheduled Exports for Heatmap Data

Given an internal client requests the heatmap data API with specified dimensions and filters When the request is processed Then the API returns 200 with JSON containing segment identifiers, risk scores, open/backlog counts, forecast values (24/48/72h), legend thresholds, and a data freshness timestamp And API latency is under 800 ms (P95) for cached queries and under 2 seconds (P95) for uncached aggregations And a scheduled export delivers a CSV or Parquet file with the same fields to configured storage at the top of each hour with ≥ 99% on-time success over a 7-day window And all outputs use SLA definitions and eligibility checks consistent with the global configuration

Click-to-Drilldown Navigation

"As a team lead, I want to click a heatmap hotspot and drill down to the exact cases so that I can take immediate action on the right tickets."

Description

Enable single-click navigation from any heatmap cell to progressively detailed views: segment summary (KPIs, trend sparkline) → filtered case list (sorted by risk/time-to-breach) → individual case detail. Preserve filter context and breadcrumbs, support back/forward and deep links for sharing. Provide batch actions (assign, prioritize) in the segment view and case list to accelerate intervention. Ensure zero additional page loads where feasible via client-side routing to keep interaction under 300ms. Integrates with existing case detail pages and assignment workflows. Outcome: users move from detection to action on the exact at-risk cases without losing context.

Acceptance Criteria

Single-Click Drilldown to Segment Summary from Heatmap Cell

Given the Heatmap view is loaded with SLA risk data and the user has permission to view segments When the user single-clicks a heatmap cell Then the app navigates to the Segment Summary view for that exact segment And applies filters matching the cell's dimensions (queue, channel, product, region, partner) and SLA window And displays KPIs: total cases, at-risk count, breach count, average time-to-breach, and 24–72h SLA forecast And renders a 7-day trend sparkline And shows a breadcrumb "Heatmap > [Segment]"

Navigate from Segment Summary to Filtered Case List Sorted by Risk

Given the Segment Summary view is open with active filters from a heatmap cell When the user activates the "View Cases" drilldown control Then the app navigates to the Case List view And the Case List is filtered identically to the Segment Summary context And the list is sorted primarily by risk score (descending) and secondarily by time-to-breach (ascending) And the total case count equals the number of cases matching the Segment Summary filters for the current data timestamp And pagination is enabled with a default page size of 50

Open Case Detail from Case List with Context Preservation

Given a filtered Case List is displayed When the user single-clicks a case row Then the app opens the existing Case Detail page for that case using the standard case route And the breadcrumb reads "Heatmap > [Segment] > Cases > [CaseID]" And the Case Detail displays SLA timer and assignment controls And when the user clicks the browser Back button, the app returns to the Case List with prior filters, sort, selection, and scroll position preserved

Breadcrumbs and Browser Back/Forward Preserve State Across Drill Path

Given the user has drilled from Heatmap to Segment Summary to Case List to Case Detail When the user uses browser Back/Forward buttons or clicks any breadcrumb link Then navigation occurs without a full page reload And the currently active filters, sort order, selection, and scroll position are preserved on each view And the breadcrumb path updates to reflect the active view and context

Batch Actions in Segment Summary and Case List

Given the user is on Segment Summary or Case List and has permission to manage cases When the user initiates a batch action (Assign or Prioritize) scoped to the current filter or selected rows and confirms Then the system updates the targeted cases accordingly And shows success/failure feedback with the count of affected cases And partial failures are listed with retry options And updates are reflected in KPIs and lists within 5 seconds

Shareable Deep Links Rehydrate Exact Context

Given the user is on Heatmap, Segment Summary, or Case List When the user copies the Share/Deep Link for the current view and an authorized user opens it in a new session Then the exact view loads with identical filters, sort, time window, and breadcrumb context And if the link references unavailable or stale entities, the app loads with a clear message and sensible default fallback filters without erroring

Client-Side Routing Performance Under 300ms Without Full Reloads

Given the app is running under standard network conditions on a supported desktop browser When the user drills between Heatmap, Segment Summary, Case List, and Case Detail Then the 95th percentile time-to-interactive for each intra-app transition is less than or equal to 300ms And no full page reloads or HTML document re-requests occur (single-page client-side routing) And performance instrumentation records navigation timing metrics for each drilldown

24–72h SLA Breach Forecasting

"As a support manager, I want a 24–72h forecast of SLA breach risk per segment so that I can plan staffing and reprioritize work proactively."

Description

Produce short‑term forecasts (24/48/72h) of SLA breach risk per segment using historical arrival rates, handling capacity, current backlog, and SLA stage progression. Compute projected breach counts and breach probabilities with confidence intervals, highlighting segments likely to exceed thresholds. Display forecast overlays on the heatmap and include a forecast panel in segment views. Allow what‑if inputs (temporary staffing, priority changes) to simulate impact. Models run incrementally to meet performance targets (<5m refresh) and reuse ClaimKit’s existing timers and status transitions to maintain consistency. Outcome: proactive planning to reallocate staff and reprioritize before breaches occur.

Acceptance Criteria

Forecast overlay accuracy for 24/48/72h horizons

Given a 30-day rolling backtest with actual breach outcomes per segment, When generating 24/48/72h forecasts, Then the MAPE of projected breach counts is <= 15% for segments with >= 50 open cases and the MAE is <= 2 for segments with < 50 open cases. Given forecasted breach probabilities per segment, When evaluated on the last 30 days, Then the Brier score is <= 0.20 and the calibration slope is between 0.8 and 1.2. Given computed forecast values, When rendered as heatmap overlays, Then displayed counts and probabilities match computed values to within rounding rules (counts rounded to whole numbers; probabilities shown as percentages to one decimal).

Segment-level breach probability and count calculations

Given segments defined by queue, channel, product, region, and partner, When the forecast job runs, Then each segment has projected_breaches and breach_probability for 24h, 48h, and 72h horizons. Given arrival rates, handling capacity, current backlog, and SLA stage progression from ClaimKit timers, When computing forecasts, Then only these inputs (and their historical values) are used and the computation is timestamped. Given an API request to retrieve a segment forecast, When calling the forecast endpoint, Then the response includes for each horizon: horizon_hours, projected_breaches, breach_probability, confidence_interval_low, confidence_interval_high, generated_at.

Confidence intervals and threshold highlighting on heatmap

Given configured risk thresholds per horizon, When forecasts are generated, Then heatmap cells are colored into Neutral/Warning/Critical bands based on projected_breaches and/or breach_probability thresholds. Given a heatmap cell, When the user hovers, Then the tooltip shows horizon, projected_breaches, breach_probability, and the 80% confidence interval. Given the user switches horizon between 24h, 48h, and 72h, When toggled, Then the heatmap updates within 1 second and the values match the selected horizon.

What‑if staffing and priority simulation impact

Given a user adjusts temporary staffing (+/− agents per shift) and priority weights for a segment, When Run Simulation is clicked, Then recomputed 24h/48h/72h forecasts appear within 5 seconds and include side-by-side deltas versus baseline for projected_breaches and breach_probability. Given invalid simulation inputs (e.g., negative capacity, non-numeric entries), When submitted, Then validation prevents execution and displays an inline error explaining the issue. Given simulation mode, When the user exits without applying, Then no production assignments, timers, or priorities are changed and baseline forecasts remain unchanged.

Incremental model refresh performance and data reuse

Given normal operating load (<= 5,000 claims/month and <= 5 concurrent viewers), When the scheduled incremental forecast refresh runs, Then end-to-end processing completes in under 5 minutes at the 95th percentile and under 10 minutes at worst case. Given ClaimKit SLA timers and status transitions, When deriving SLA stage progression, Then the derived stages match production definitions with per-case stage timing differences <= 1 second. Given no material data changes since the last run, When a refresh triggers, Then the incremental process completes in under 60 seconds by reusing previously computed state.

Forecast panel in segment drilldown with consistent timers

Given a user clicks a heatmap cell, When the segment view opens, Then a Forecast panel is visible and shows for each horizon (24/48/72h): projected_breaches, breach_probability, 80% confidence interval, generated_at timestamp, and last_refresh_duration. Given the Forecast panel, When the user switches the horizon tab, Then metrics update within 500 ms and exactly match the heatmap values for the same segment and horizon. Given the segment’s SLA timers, When rendering remaining time and stages in the forecast panel, Then values use the existing ClaimKit timers and stage labels with no discrepancies in naming and <= 1 second difference in remaining time per case aggregate.

Dimension & Threshold Configuration

"As a system administrator, I want to configure dimensions, thresholds, and SLA definitions so that the heatmap reflects our operations and risk tolerance."

Description

Provide admin controls to choose which dimensions appear on the heatmap (queue, channel, product, region, partner, custom tags), define segment hierarchies, and map dimension values to friendly labels. Allow configuration of SLA risk thresholds (colors, numeric cutoffs, time-to-breach buckets), business hours/holidays, and default time windows. Support saved views (per role/team) and environment-level defaults. Validate configurations and apply safely with versioning and rollback. Integrates with ClaimKit’s SLA policy engine so that changes propagate to alerts and reports consistently. Outcome: the heatmap reflects each organization’s operating model and risk tolerance.

Acceptance Criteria

Admin selects dimensions and hierarchy for heatmap

Given I am an Admin with Configure Heatmap permission When I select dimensions from the allowed set [queue, channel, product, region, partner, custom tags] and arrange a hierarchy (e.g., queue > region > product) And I click Preview Then the preview heatmap reflects the selected dimensions and hierarchy within 2 seconds And dimensions outside the allowed set cannot be added When I save as Environment Default Then the default is applied to all users on next heatmap load And the change is captured in the audit log with user, timestamp, and diff

Admin maps dimension values to friendly labels

Given there are raw dimension values (e.g., region_code = "US-W" and product_sku = "A12-XY") When I create mappings to friendly labels (e.g., "US West", "Model A12 XY") via UI or bulk CSV upload Then the heatmap, drilldowns, tooltips, and exports display the friendly labels And search/filter accepts either raw or friendly values and resolves to the same segment And duplicate or conflicting mappings are rejected with inline errors before save And if a value lacks a mapping, the raw value is displayed without breaking filters And mappings do not modify underlying IDs stored in cases

Admin configures SLA risk thresholds and time-to-breach buckets

Given default risk thresholds and bucket definitions exist When I set numeric cutoffs (e.g., High ≥ 0–2h to breach, Medium 2–8h, Low > 8h) and assign colors Then buckets are non-overlapping, cover the full range, and validations block overlaps or gaps And the legend updates instantly to show labels, ranges, and colors And historical and forecast classifications in the heatmap recompute using the new thresholds upon Save And invalid inputs (negative times, non-numeric, duplicate labels) are prevented from publishing

Admin sets business hours, time zones, and holidays

Given an organization with queues across multiple time zones When I configure business hours and holidays globally and per-region/queue Then SLA timers and time-to-breach calculations use the selected calendars and time zones And changes saved trigger recalculation of heatmap risk states and 24–72h forecast within 5 minutes And overlapping or duplicate holidays are validated and blocked before save And an audit entry records the calendar changes and effective time

Saved views per role/team and environment defaults

Given I have a configured heatmap When I save a view with chosen dimensions, hierarchy, thresholds, calendar, and time window Then I can set it as default for specific roles/teams or keep it personal And applying a saved view updates the heatmap and encodes the view in the URL for shareability And first-time users in a role load their role default; if none, the environment default applies And deleting a default view gracefully falls back to the environment default without error And permissions prevent non-admins from changing environment defaults

Versioning, safe apply, and rollback of configurations

Given configuration versioning is enabled When I save changes to dimensions, mappings, thresholds, calendars, or defaults Then a new version is created with semantic diff, author, timestamp, and a comment And I can Preview Impact against a sampled dataset before publishing And publishing is atomic; either the entire configuration becomes active or nothing changes And I can rollback to a prior version, restoring all settings and reapplying them within 60 seconds And all version changes are visible in the audit trail

Propagation to SLA engine, alerts, and reports

Given active alerts and scheduled reports depend on SLA risk categories When I publish new thresholds or calendars Then open alerts are re-evaluated using the new rules within 60 seconds and updated accordingly And new alert triggers and suppressions use the updated rules immediately after publish And reports generated after the effective time use the updated categories, matching the heatmap counts for the same filters and time window And no inconsistencies exist between heatmap, alerts, and reports for the same dataset and time And failures to propagate are surfaced with an error and no partial state is applied

Role-Based Visibility & Data Governance

"As a compliance-conscious manager, I want heatmap and drilldown visibility to respect roles and privacy so that sensitive information is protected while enabling oversight."

Description

Respect ClaimKit RBAC and data residency rules so users only see segments and cases they are authorized to view (e.g., brand, region, partner scoping). Mask PII in aggregate views and previews; enforce cell-level suppression when sample sizes are below privacy thresholds. Log access and drilldown events for auditability. Ensure shared links inherit or recheck permissions at open time. Provide tenancy isolation for multi-tenant deployments. Outcome: actionable visibility without exposing sensitive data or violating compliance.

Acceptance Criteria

RBAC-Scoped Heatmap and Drilldown Visibility

Given a user with role-based scopes (brands, regions, partners, queues, channels) When they open the Heatmap Drilldowns and apply any filters Then only segments intersecting their scopes are visible and aggregate counts include only authorized cases. Given the user clicks a hotspot to drill into the case list When results render Then only cases within the user's scopes appear and unauthorized cases are excluded. Given the user attempts to access a case or segment outside their scope via URL or search When the request is processed Then the system returns 403 Not Authorized without revealing whether the resource exists. Given a user has no access to a dimension value When the heatmap renders Then that value is hidden from filter controls and axis labels.

PII Masking in Aggregate and Preview Views

Given any aggregate heatmap, tooltip, or case-list preview When PII fields (full name, email, phone, street address, serial) would be displayed Then they are masked per policy (e.g., initials, redacted user, last4) unless the user opens an authorized case detail view. Given a user drills into an authorized case and opens the case detail When PII renders Then masking is removed in the detail view only and remains masked in surrounding aggregate components. Given an export, screenshot, or shared view of the heatmap When the content is generated Then the same masking rules are applied and verified in the artifact.

Cell-Level Suppression for Small Aggregates

Given an aggregate cell or metric with sample size n < T (org-configured privacy threshold, default T=10) When rendering the heatmap, tooltips, or previews Then the value is suppressed (displayed as "<T") and drilldown links are disabled. Given filters are added or removed When the resulting sample size remains below T Then suppression persists and no intermediate UI reveals the exact value. Given neighboring or complementary filters When applying any combination Then no series of interactions reveals a precise count below T via differencing; the UI suppresses or buckets as needed. Given an export or shared link When opened Then suppressed cells remain suppressed for all recipients.

Audit Logging of Heatmap Access and Drilldowns

Given a user opens the heatmap, changes filters, clicks a hotspot, opens a case list, or adds an annotation When the action occurs Then an audit event is written within 2 seconds containing userId, tenantId, timestamp (UTC), action, resource identifiers, applied filters, client IP, and user-agent. Given audit events are stored When queried by an authorized Auditor/Admin Then they are immutable, timestamp-ordered, and retrievable for at least 365 days. Given a transient write failure When logging fails Then the system retries with exponential backoff up to 3 times and emits an operational alert without blocking the user action.

Permission Recheck on Shared Links

Given a user creates a shared link to a heatmap view or case list When the link is opened by any recipient Then current RBAC is evaluated for that recipient and only authorized data is shown; unauthorized recipients receive 403 with no aggregate values or PII. Given the sharer's permissions change after link creation When the link is opened Then the recipient's current permissions are used and the link does not confer the sharer's prior access. Given a shared link is opened across tenants When tenant context does not match Then access is denied and no data is leaked. Given masked or suppressed content in the source view When accessed via shared link Then the same masking and suppression rules are enforced.

Multi-Tenant Isolation in Heatmap and Drilldowns

Given a multi-tenant deployment When a user from tenant A views the heatmap or drills into cases Then queries and caches are scoped to tenant A only and no data from other tenants is returned or rendered. Given a user attempts to access a case ID or segment belonging to another tenant When the request is processed Then the system returns 404 Not Found or 403 without confirming existence. Given application logs and analytics When events are emitted Then tenantId is included and data is partitioned to prevent cross-tenant aggregation. Given background jobs compute forecasts or aggregates When they run Then they compute per-tenant outputs and write to tenant-scoped storage.

Forecast Views Honor RBAC and Privacy

Given the 24–72h SLA-risk forecast heatmap When rendered for a user Then only segments within the user's scopes are included, and counts/risks are computed from authorized data only. Given forecast cells with sample size n < T When displayed Then values are suppressed consistent with privacy threshold rules and PII remains masked. Given a user drills from a forecast hotspot to cases When the list renders Then only authorized cases are shown and all auditing, masking, and suppression rules apply.

Incident Annotations & Tagging

"As a regional director, I want to annotate hotspots with incident context so that the team understands root causes and executives get accurate briefings."

Description

Allow users to add time‑stamped annotations to heatmap cells and segment views to capture incident context (e.g., carrier outage, parts shortage), tag with categories, attach links/files, and @mention teams. Surface annotations in drilldowns and exports, and include them in the audit log. Provide filters by tag and time to correlate annotations with KPI shifts. Notifications inform watchers on creation/updates. Outcome: shared situational context that accelerates root‑cause analysis and executive briefings.

Acceptance Criteria

Add Annotation to Heatmap Cell and Segment View

Given a user with Edit permission is viewing the Heatmap or a segment view When the user selects a cell or segment and clicks "Add Annotation", enters text up to 2000 characters, selects at least one category tag, and saves Then the annotation is created with a server-side UTC timestamp (to the second), the author, and the target cell/segment identifiers, and appears in the annotation timeline within 2 seconds And the annotation counter badge on the targeted cell/segment increments by 1 And an AnnotationCreated entry is written to the audit log capturing actor, timestamp, cell/segment identifiers, text hash, and tags

Tag and Time Filters

Given multiple annotations with different tags and timestamps exist across queues/channels/products/regions/partners When the user applies one or more tag filters and a time range filter Then the heatmap highlights only cells/segments that have annotations matching the active filters and updates counts accordingly within 1 second And the drilldown list shows only annotations and cases within the active filters And clearing filters restores the unfiltered view within 1 second

Attachments and Links

Given the user is creating or editing an annotation When the user attaches up to 10 files (PDF, PNG, JPG, JPEG, GIF, CSV, XLSX, DOCX, TXT), each no larger than 25 MB, and/or adds up to 5 HTTPS URLs Then files are scanned for viruses/malware and rejected with an error message if infected; accepted files are stored and are downloadable by authorized users And URLs are validated for format and reachability (HTTP 2xx/3xx) at save time; invalid URLs block save with a clear error And the saved annotation displays attachment icons/thumbnails and URL titles in drilldowns and detail views

@Mentions and Watcher Notifications

Given watchers are configured for the relevant queue/segment and the user has permission to notify When the user includes one or more @user or @team mentions in a new annotation and saves Then mentioned users/teams and watchers receive an in-app notification immediately and an email or Slack notification (if configured) within 60 seconds containing the annotation summary, tags, and a deep link to the heatmap location And when the annotation is edited, a single update notification per change is sent to the same audience per channel with a diff summary And duplicate notifications for the same event and channel are suppressed

Drilldowns, Exports, and Audit Log Surfacing

Given annotations exist for one or more hotspots When a user drills down from a heatmap hotspot Then an annotations panel is visible and lists annotations in reverse chronological order, honoring any active tag/time filters And when the user exports the drilldown or heatmap with "Include Annotations" enabled Then the export includes for each annotation: annotation_id, timestamp_utc, author, cell/segment identifiers, tags, text, attachment_count, link_count And the audit log contains create and update entries for each annotation action with before/after values, actor, timestamp, and entity identifiers

Editing, Version History, and Deletion

Given an existing annotation that the user is authorized to modify When the user edits the text, tags, attachments, or mentions and saves Then the annotation preserves created_at, updates updated_at (UTC), increments a revision number, and a version history entry is created and viewable And watchers and mentioned users receive an update notification as specified When the user deletes an annotation Then the annotation is soft-deleted, removed from default views, recorded in the audit log with actor and timestamp, and excluded from future exports unless "Include Deleted" is explicitly selected

Nudge Orchestrator

Context‑aware nudges that suggest the next action (call customer, request part, send Step‑Up Proof) via in‑app, Slack, email, or SMS. Bundles similar nudges, respects quiet hours, and measures impact on saves to avoid alert fatigue. Keeps agents moving without micromanagement and rescues borderline cases early.

Requirements

Contextual Trigger Engine

"As an operations lead, I want nudges to be triggered by the true context of a case so that agents always see the most impactful next step at the right moment."

Description

Real-time service that evaluates claim context and operational signals (SLA phase, warranty eligibility, receipt/serial extraction, customer sentiment, parts availability, agent workload) to determine and rank the next best action. Ingests events from ClaimKit’s magic inbox, claims queue, and integrations to generate nudge candidates with a reason code and priority. Provides idempotent decisioning, configurable thresholds, and sub-300ms latency to keep agents in flow. Outputs structured nudge payloads for delivery channels and logs decisions for analytics.

Acceptance Criteria

P95 Latency Under Peak Load

Given the engine receives 200 requests per second with 100 concurrent clients for 5 minutes When decisions are requested with complete contexts Then the end-to-end decision latency is <= 300 ms at p95 and <= 500 ms at p99 And the error rate is < 0.1% and no queue backlog exceeds 1 second

Idempotent Decision on Duplicate Event Payloads

Given two or more identical decision requests share the same dedupeKey or eventId within a 24-hour window When the requests are processed Then the engine returns the same decisionId and result for all duplicates And only one nudge payload is emitted and logged, and subsequent duplicates return a deduplicated=true flag

Deterministic Ranking With Reason Codes

Given multiple candidate actions are generated for a claim When the engine ranks candidates Then candidates are sorted by descending priorityScore and ties are broken deterministically using claimId+actionType And every candidate includes a non-empty reasonCode and the top-ranked candidate is non-null

Configurable Thresholds Apply Without Deploy

Given a new configuration version updates thresholds for sentiment, SLA minutesRemaining, and workload caps When the config is saved to the configuration store Then the engine applies the new configuration within 60 seconds without restart And decision logs include configVersion and decisions reflect the new thresholds And if the new config fails validation, the engine retains the lastKnownGood config and emits a config_error event

Structured Nudge Payload Schema

Given the engine emits a nudge payload When validating against the JSON schema Then required fields exist: nudgeId, claimId, actionType, priority, reasonCode, confidenceScore, channelTargets[], eligibilityState, slaPhase, createdAt, dedupeKey, traceId, configVersion And schema validation passes with no additionalProperties error and optional channelMetadata is permitted

Comprehensive Context Evaluation

Given live values for SLA phase, warranty eligibility, receipt/serial extraction status, customer sentiment, parts availability, and agent workload are available When the engine evaluates a claim Then each signal is read at decision time with freshness <= 60 seconds or marked stale And ineligibility gates suppress inapplicable actions, and unavailable parts suppress order-dependent actions And across the regression test suite, the expected top action matches the oracle in >= 95% of cases

Decision Logging for Analytics and Traceability

Given any decision is produced When writing the decision log Then exactly one log entry is written per decision with fields: decisionId, claimId, eventIds[], inputFeatures (PII masked), candidateList with scores, chosenAction, latencyMs, errorCode (if any), deduplicated flag, configVersion, rulesetVersion or modelVersion, timestamp And the log is queryable in the analytics store within 2 seconds of decision time

Multi-Channel Nudge Delivery

"As a support agent, I want nudges to reach me in my preferred channel with a direct link to act so that I can respond quickly without switching tools."

Description

Deliver nudges as actionable messages across in-app cards, Slack (DM/channel), email, and SMS with per-agent/team channel preferences, fallback routing, and delivery receipts. Supports templated content, deep links back to the claim and action flows, link tracking, and retries with exponential backoff. Ensures consistent formatting and tracking IDs across channels to unify reporting and attribution.

Acceptance Criteria

Agent Preference-Based Channel Routing

Given agent A has channel preferences primary=Slack DM and fallback=Email, and a nudge N for claim C is generated and assigned to agent A When the orchestrator dispatches nudge N Then nudge N is delivered via Slack DM to agent A within 5 seconds of dispatch And the message includes a templated title, claim identifier C, and a tracking_id as a UUIDv4 And the message contains deep links to the claim detail and the specific action flow referenced by the nudge And a delivery receipt with status="Delivered", channel="Slack DM", provider_message_id, and delivered_at timestamp is recorded

Team Defaults With Agent Override

Given agent B has no explicit channel preferences and belongs to team T with defaults primary=Email and fallback=SMS, and nudge N for claim D is generated When the orchestrator dispatches nudge N Then nudge N is sent via Email to agent B's primary email address And the email subject and body are rendered from the selected template and include tracking_id and deep links And a delivery receipt with status="Delivered" and channel="Email" is stored When agent B later sets primary=SMS and a new nudge N2 is generated Then nudge N2 is delivered via SMS to agent B and the delivery receipt reflects channel="SMS"

Fallback Routing With Exponential Backoff Retries

Given agent C's primary channel is Slack DM and fallback is Email, and nudge N is dispatched When the Slack API responds with a transient error or no acknowledgment within 30 seconds Then the orchestrator retries Slack delivery with exponential backoff at 1m, 2m, 4m, and 8m (max 4 attempts) with ±10% jitter And if all Slack attempts fail, the orchestrator routes nudge N to Email within 30 seconds after the final Slack failure And all attempts and outcomes are recorded on the delivery receipt with attempt_number, channel, status, error_code, error_message, and timestamp And once any channel reports Delivered, all remaining scheduled retries are canceled

Delivery Receipts API and UI Visibility

Given a tracking_id T for nudge N When querying the Delivery Receipts API by tracking_id T Then the API returns 200 with receipt fields: tracking_id, claim_id, nudge_type, channel, status_history (Queued, Sent, Delivered, Failed), provider_message_id(s), attempt_count, and timestamps And the UI timeline for claim_id shows the same status history within 5 seconds of receipt updates And if no receipt exists for T, the API returns 404

Consistent Formatting and Template Rendering Across Channels

Given template X with placeholders {agent_name}, {claim_id}, {action_link} and payload for claim C and agent A When rendering template X for channels In-App, Slack DM, Email, and SMS Then all outputs include identical core content (title, claim reference, tracking_id) with channel-appropriate formatting And Slack uses Block Kit sections and buttons, Email uses accessible HTML, SMS is <= 320 characters with a single short link, and In-App uses a card component And the same tracking_id value is present in all rendered messages and embedded links across all channels And rendering fails with a clear error if any required placeholder is missing, and the nudge is not dispatched

Deep Links to Claim and Action Flows with Auth

Given a deep link generated for claim C and action "Request Part" with tracking_id T When a logged-in agent clicks the link from any channel Then the app opens to claim C and pre-opens the "Request Part" flow within 2 seconds When a not-logged-in agent clicks the link Then the agent is redirected to SSO and, after successful authentication, returned to claim C with the "Request Part" flow open And deep links expire after 7 days; expired links show a friendly error and no action is taken And all link clicks are attributed to tracking_id T and the originating channel

Link Tracking and Unified Attribution

Given nudge N with tracking_id T is delivered across Slack, Email, and SMS When the recipient clicks any channel link Then a Click event is recorded with fields: tracking_id T, channel, claim_id, nudge_type, timestamp, and user_id And unique clicks for T are deduplicated within a 24-hour window And Email Opens are recorded via tracking pixel when supported and tied to T; Slack, SMS, and In-App track clicks only And the Analytics API returns aggregate metrics per tracking_id T including delivered_count, failed_count, click_count, unique_click_count, and channel_breakdown And events are queryable within 60 seconds of occurrence

Quiet Hours, Throttling, and Bundling

"As an agent, I want nudges to pause during my quiet hours and arrive in smart bundles so that I stay focused without missing time-sensitive items."

Description

Policy layer that respects per-user time zones, quiet hours, and DND settings; enforces rate limits (e.g., max N nudges per hour) and deduplicates similar prompts. Bundles related nudges into periodic digests with clear prioritization and reasons to reduce alert fatigue. Holds and releases queued nudges after quiet periods while preserving SLA awareness to avoid breaching critical timers.

Acceptance Criteria

Per-User Quiet Hours by Time Zone

Given a user with timezone=America/New_York and quiet_hours=21:00–08:00, When a nudge is generated at 01:30 local, Then it is not dispatched on Slack, email, or SMS and is queued for release at 08:00 local. Given the same user, When a nudge is generated at 07:59 local, Then it remains queued and is released at 08:00±1 minute local. Given quiet hours span midnight, When multiple nudges are generated during the window, Then 0 are dispatched externally and all are queued with original trigger timestamps preserved.

Channel DND Respect and In-App Fallback

Given Slack DND is active for the user, When a nudge is generated, Then it is not delivered via Slack and alternate channels are considered per configured priority (in-app, email, SMS), each respecting its own DND; if none are available, the nudge is queued in-app only. Given all configured channels are in DND or quiet for the user, When a nudge is generated, Then it appears in the in-app queue and is scheduled for the earliest allowed external channel send time. Given a channel’s DND ends before the user’s quiet hours end, When the earliest allowed channel becomes available, Then the queued nudge is delivered via that channel.

Global Per-User Rate Limiting Across Channels

Given max_nudges_per_hour_per_user=5 with a 60-minute rolling window, When 7 nudges are generated within 60 minutes, Then only 5 are delivered and 2 are deferred to the next available window respecting quiet hours. Given a digest bundles 4 nudges, When counting against the rate limit, Then the digest counts as 1 delivery event. Given the rolling window elapses, When more nudges are eligible, Then delivery resumes up to the configured limit.

Similar Nudges Deduplication Window

Given dedup_window=30 minutes and dedup_keys=[case_id, action_type, reason_code], When multiple matching nudges are triggered within the window, Then only one nudge remains active and subsequent matches increment a seen_count on that nudge. Given the same match occurs after the dedup_window, When a new nudge is triggered, Then a new nudge is created. Given two nudges differ by action_type, When triggered within the window, Then both remain and are eligible for bundling.

Periodic Digest Bundling with Prioritization and Reasons

Given digest_frequency=30 minutes and max_items_per_digest=20, When more than 20 nudges are queued, Then a digest is sent with the top 20 and the remainder are kept for the next digest window. Given items have time_to_SLA_breach and impact_score, When building a digest, Then items are ordered by time_to_SLA_breach ascending then impact_score descending and each displays its primary reason_code and recommended_action. Given multiple nudges relate to the same case_id, When bundling into a digest, Then they are grouped under a single case entry showing count and consolidated reasons.

Quiet-Hours Release With SLA Awareness

Given quiet_hours=22:00–07:00 local and pre_quiet_lead_time=15 minutes, When an item’s SLA_breach_time occurs before quiet end, Then a pre-quiet digest including that item is sent at or before 21:45 local. Given items are queued during quiet hours, When quiet ends at 07:00 local, Then a digest is released by 07:05 local prioritizing items with the least time_to_SLA_breach. Given an item is created during quiet hours with SLA_breach_time before quiet end and pre_quiet_lead_time has passed, When quiet ends, Then the item is marked breached and listed first with a breach indicator.

One-Click Action Cards

"As a support agent, I want to complete the suggested next step with one click from the nudge so that I can resolve cases faster without navigating multiple screens."

Description

Nudges include embedded, context-aware action buttons (e.g., Call Customer, Request Part, Send Step‑Up Proof) that prefill data from the claim, execute workflows, and confirm outcomes inline. Supports role checks, idempotency, and error handling with immediate feedback. Records chosen actions and outcomes back to the claim timeline to maintain a complete, auditable history.

Acceptance Criteria

In-App One-Click Executes Prefilled Workflow

Given an agent views a nudge with a “Request Part” action in-app and the claim has all required fields, When the agent clicks the button, Then the workflow executes using prefilled data from the claim snapshot at click-time and the button shows a processing state within 300 ms. Given the action completes successfully, When the server responds, Then the UI shows an inline success confirmation with reference IDs within 2 seconds (p95) and the button becomes disabled with a “Completed” state. Given any required field is missing, When the agent clicks the button, Then an inline form opens pre-populated with available claim data, validates inputs client-side, and only enables submit when validation passes. Given the action is executed, When the workflow includes downstream updates (e.g., part order created), Then the claim status and related fields are updated atomically and reflected in the in-app view within 5 seconds (p95).

Slack Card Action Executes and Confirms Inline

Given a Slack nudge contains a “Call Customer” or “Send Step‑Up Proof” button, When an authorized agent clicks the button, Then the request is signed and verified server-side and the same workflow as in-app is executed. Given the action is accepted, When Slack receives the response, Then the message is updated or an ephemeral confirmation is posted within 2 seconds (p95) showing outcome and reference IDs. Given the agent is on mobile Slack, When they click the button, Then the action executes successfully with identical behavior and confirmation. Given the agent lacks a valid session, When they click the button, Then a secure one-time deep link prompts authentication and returns them to complete the action without losing context.

Role-Based Visibility and Execution Control

Given the viewer lacks permission for the action by role or scope, When the nudge renders, Then the action button is hidden or disabled with a tooltip explaining “Insufficient permissions”. Given a user without permission attempts to invoke the API, When the request is received, Then the server returns HTTP 403 with an error code and no side effects are performed. Given a user with permission views the nudge, When it renders, Then the action button is enabled and audit metadata includes the actor’s role and scope. Given an admin override policy is configured, When an override is used, Then the action logs the override reason and requires a confirmation step.

Idempotent Handling of Repeated Clicks and Retries

Given a nudge action instance has an idempotency key, When the button is clicked multiple times or a retry occurs within 15 minutes, Then exactly one downstream workflow is executed and subsequent attempts return “Already processed” with the original outcome. Given a network timeout occurs after the server processed the action, When the client retries with the same idempotency key, Then the server returns the prior result without duplicate side effects. Given a third-party webhook retries the callback, When the server receives the duplicate callback, Then the system recognizes the prior completion and does not reapply changes.

Immediate Error Feedback With Guided Recovery

Given a recoverable error occurs (e.g., upstream 503), When the action fails, Then the UI shows an inline error with human-readable message, error code, and correlation ID within 2 seconds (p95) and offers a Retry button. Given a validation error occurs, When the agent submits, Then specific fields are highlighted with inline messages and the action is not sent to the server until corrected. Given a non-recoverable error occurs, When failure is detected, Then the UI provides a fallback link to open the full workflow in ClaimKit and preserves entered data. Given a retry succeeds, When the action completes, Then the error state clears and the success confirmation is shown with the same idempotency key.

Action and Outcome Logged to Claim Timeline

Given any one-click action is executed, When the workflow completes (success or failure), Then an immutable timeline entry is written within 5 seconds (p95) including action name, actor, role, timestamp (UTC), channel (in-app/Slack/email/SMS), input summary, outcome status, reference IDs, idempotency key, and correlation ID. Given the timeline entry is created, When a user opens the claim, Then the entry is visible and filterable by action type and outcome, and links to any related external artifacts (e.g., part order). Given a timeline entry is audited, When its payload is requested via API, Then the response matches the on-screen details and includes a signature or hash for tamper-evidence.

Context Prefill Accuracy and Staleness Handling

Given a nudge is generated at time T, When the agent clicks an action at time T+n, Then the prefilled payload uses the latest persisted claim values at click-time and not stale snapshot values. Given the claim changed after the nudge was issued, When the action opens, Then the UI refreshes dependent fields or prompts the agent to confirm updated values before submission. Given a required field is missing from the claim, When the action is initiated, Then the inline form pre-populates known values and enforces server-side validation on submit to prevent incomplete workflows.

Impact Measurement & A/B Controls

"As a product manager, I want to A/B test nudges and see their impact on saves and resolution speed so that we can scale only the messages that work and reduce noise."

Description

Analytics and experimentation framework that measures nudge acceptance, time-to-action, resolution time deltas, conversion to save/repair, and downstream CSAT. Supports control groups, variant testing, and attribution models to quantify incremental impact. Provides fatigue scoring and auto-throttling based on performance. Exposes dashboards and exports to BI for continuous optimization.

Acceptance Criteria

A/B Experiment Setup and Randomization Integrity

- Given an experiment with Control and two Variants and a 33/33/34 allocation configured, When the experiment is activated and 1,500 eligible cases enter, Then each arm receives assignments within ±2% of its target allocation and assignment records persist experiment_id, arm_id, case_id, and assigned_at. - Given a case fails eligibility (e.g., quiet hours, channel blocklist), When it reaches the assignment step, Then it is not assigned to any arm and an exclusion_reason and rule_id are logged with a timestamp. - Given a case is assigned but no nudge is delivered (e.g., resolved before send), When exposure status is evaluated, Then the case is flagged "assigned_not_exposed" and excluded from per‑protocol metrics while retained for intent‑to‑treat. - Given an active experiment, When a variant is paused, Then new assignments route only to remaining arms within 60 seconds and allocation percentages auto‑normalize.

Nudge Acceptance and Time-to-Action Measurement

- Given a nudge is delivered via any channel, When the recipient accepts or performs the target action, Then an acceptance event with nudge_id, channel, experiment_arm, and accepted_at is recorded within 30 seconds and deduplicated per nudge instance. - Given multiple nudges are bundled in one message, When the user accepts any suggestion, Then only the selected suggestion is marked accepted and others are marked "skipped_bundled" for the same message_id. - Given a nudge is delivered, When the first qualifying action is taken, Then time_to_action equals action_at minus delivered_at and is stored with millisecond precision. - Given an acceptance occurs outside the configured attribution window, When metrics are computed, Then the acceptance is excluded from primary KPIs and included in a "late_accept" count.

Resolution Time Delta Computation

- Given historical baseline is defined as median resolution_time for matched past cases over the last 90 days, When a treated case resolves, Then delta_resolution_time equals treated_resolution_time minus matched_baseline and is stored with match_method and cohort_id. - Given an experiment has ≥100 resolved cases per arm, When aggregate lift is computed, Then the dashboard shows mean and median delta with 95% CI using Welch's t‑test for means and Hodges‑Lehmann for medians. - Given SLA timers start at eligibility, When resolution occurs, Then resolution_time is measured from SLA_start to resolved_at and backfilled if SLA_start was delayed.

Conversion to Save/Repair and CSAT Attribution Models

- Given conversion event definitions are configured (save, repair), When events occur, Then they are linked to case_id and nudge_id and counted exactly once per case per definition. - Given attribution model is set to first‑touch, When multiple nudges precede conversion within the attribution window, Then only the earliest exposed nudge receives credit; for last‑touch only the latest exposed nudge; for time‑decay, weights sum to 1.0 with half‑life configurable. - Given CSAT survey responses are ingested with case_id within 30 days of resolution, When attribution metrics are generated, Then CSAT deltas by arm are displayed and exportable with the selected model applied.

Fatigue Scoring and Auto-Throttling Controls

- Given fatigue score is defined as a weighted count of nudges per recipient over 7 days, When the score exceeds threshold T, Then additional nudges are suppressed for that recipient until the score falls below T or an override rule applies, and each suppression is logged. - Given performance of a running experiment drops below a guardrail (e.g., acceptance rate lift < 0% for 1,000 exposures), When auto‑throttling is enabled, Then send rate to underperforming arms is reduced by ≥50% within 5 minutes and a notification is sent to Slack and email. - Given quiet hours are configured, When current time is within quiet hours, Then no nudges are sent and queued messages are delivered within 5 minutes after quiet hours end with original message_id retained.

Experiment and Impact Dashboards

- Given a user opens the Impact dashboard, When filters for date range, segment, channel, experiment, and arm are applied, Then KPIs (acceptance rate, time‑to‑action, resolution delta, conversion rate, CSAT) update within 2 seconds for datasets up to 1M events. - Given an experiment has active control and variants, When viewing the dashboard, Then lifts vs control and 95% CIs are displayed and statistically significant results are highlighted with a configurable p‑value threshold. - Given a KPI card is clicked, When drilldown is invoked, Then the user can view anonymized row‑level events and export CSV up to 100,000 rows per download.

Data Exports to BI Platforms

- Given a daily export schedule to S3 and BigQuery is configured, When the export runs, Then partitioned Parquet files and BigQuery tables are produced by event_date with schemas versioned and documented, including experiment_id, arm_id, case_id, nudge_id, exposure, acceptance, time_to_action, resolution_time, conversion, csat, fatigue_score. - Given an export completes, When validation runs, Then row counts reconcile within ±0.5% against the source event store and failures trigger up to 3 retries and an alert to Ops. - Given PII handling rules are configured, When exporting, Then PII fields are hashed or excluded as specified, and a data dictionary and change log are included in the export manifest.

Admin Rules & Policies Console

"As an administrator, I want to configure when and how nudges fire across teams and brands so that the system reflects our policies without engineering changes."

Description

Administrator UI to author and version rules that map triggers and conditions to nudge content, channel, schedule, and bundling behavior. Includes segmenting by brand, product line, SLA tier, and customer profile. Offers a safe test mode and simulation against historical claims, change reviews, and role-based permissions. Integrates with template management for localized content.

Acceptance Criteria

Author and publish a new nudge rule with triggers, conditions, and actions

Given I am an Admin with Rule:Create permission When I create a rule with at least one trigger, at least one condition, and actions specifying nudge content reference, delivery channel, schedule offset, and bundling key And I click Validate Then the console returns zero validation errors within 2 seconds And the rule is saved as version 1 with status Draft When I click Publish Then the rule status changes to Active within 5 seconds And the rule is visible in the Rules list with version=1 and status=Active And an audit record is created capturing actor, timestamp, and rule checksum And the Orchestrator receives the rule payload with a correlationId

Edit, version, and approve a rule change

Given an existing Active rule v1 and I have Rule:Edit permission When I open the rule and make changes Then a new Draft version v2 is created; v1 remains Active and immutable And the console displays a visual diff between v1 and v2 When I submit v2 for review Then a Reviewer approval is required before Publish And all approval/rejection actions are logged with comments When v2 is published Then v2 becomes Active and v1 is archived, with the ability to roll back to v1 And the Rules list shows the current Active version and prior versions with timestamps

Segment targeting by brand, product line, SLA tier, and customer profile

Given I am creating or editing a rule When I add segment filters for brand, product line, SLA tier, and customer profile attributes Then the UI enforces valid selectable values and prevents empty segment definitions And the filter logic supports AND within groups and OR across groups as configured When I run Preview Reach Then the console returns a count of historical claims matching the segment and shows up to 20 sample claim IDs within 5 seconds And Save is blocked if segmentation is required by policy and no segment dimension is defined

Safe test mode and historical simulation

Given a Draft rule and I have Rule:Simulate permission When I toggle Test Mode and run a simulation over a selectable historical date range Then the console executes the rule logic against historical claims without emitting any live nudges And it returns total matches, per-channel schedule previews honoring quiet hours and time zones, and a list of the first 20 matched cases within 60 seconds And the simulation run is recorded in the audit log

Configure quiet hours and bundling policies in rule actions

Given I am defining rule actions When I set quiet hours per channel and select the rule scheduling time zone Then the UI validates that quiet hours are within 24 hours and non-overlapping When I set a bundling window duration and deduplication keys Then the UI validates allowed ranges and required keys And the schedule preview shows the next eligible send time for a sample timestamp considering quiet hours and bundling And the persisted rule payload includes quietHours, timeZone, bundlingWindow, and dedupeKeys

Template management integration and locale coverage

Given I am selecting nudge content for a rule When I search and choose a template key from the Template Library Then the console fetches available locales and placeholders for that template And it validates that all required locales for the selected brand(s) are covered or have defined fallbacks And it validates that all placeholders referenced by the template are supplied by the rule context mapping When validation fails Then Publish is disabled and inline errors identify the missing locales or placeholders And the content preview renders correctly for at least three selected locales before Publish

Role-based permissions and change review enforcement

Given RBAC roles Admin, Editor, Reviewer, and Viewer are configured When a user without the required permission attempts Create, Edit, Publish, or Delete Then the action is blocked with a 403 message and no changes are persisted When an Editor submits a Draft for review Then at least one Reviewer approval is required before Publish (two-person rule optional toggle) And all actions (create, edit, review, publish, rollback, delete) are captured in an audit log with actor and timestamp And deactivated users cannot access the console and any pending approvals by them are invalidated

Audit & Compliance Logging

"As a compliance officer, I want a complete audit of nudge communications and actions so that we can meet regulatory requirements and resolve disputes confidently."

Description

End-to-end audit trail that records nudge generation, content, targeting, delivery status, user actions, and timestamps, with immutable IDs linked to claims. Supports retention policies, PII minimization, and consent tracking for SMS/email with easy opt-out handling. Exposes exportable logs and APIs for compliance reviews and partner audits.

Acceptance Criteria

Immutable Event Logging per Nudge

- Given a nudge is generated for a claim, When the orchestrator creates the nudge, Then an audit event is written with fields: event_id (UUIDv4), event_type="nudge.generated", claim_id, tenant_id, correlation_id, created_at (ISO 8601 UTC), orchestrator_version, actor="system". - Then event_id is immutable and unique; When any update/delete is attempted via API, Then the request is rejected with 409 and no mutation occurs. - When a correction is needed, Then a new append-only event event_type="audit.correction" is created referencing the prior event_id and reason; original event remains unchanged. - Then 100% of generated nudges have a corresponding "nudge.generated" audit event within 200 ms of generation.

Content and Targeting Metadata Capture

- Given a nudge is generated, Then the audit record stores: template_id, variant_id, channel, content_hash (SHA-256), targeting_rule_ids[], model_version, score, decision_explanations[], and contains no full message body. - Then email addresses and phone numbers, if present in metadata, are stored masked (e.g., j***@example.com, +1******1234); no raw PII appears in content fields. - When validating a sample of 100 nudges, Then 100% contain the required metadata fields and 0% contain unmasked PII or full message text.

Delivery and User Action Tracking

- Given a nudge is sent via a provider, When provider webhooks (sent, delivered, failed, opened, clicked) arrive, Then audit events ("nudge.sent","nudge.delivered","nudge.failed","nudge.opened","nudge.clicked") are recorded with provider_message_id, status, and timestamp within 5 seconds of webhook receipt. - Then all in-app agent actions (dismissed, snoozed, acted) produce audit events with user_id, action_type, action_context, and timestamp at action time. - Then 100% of outbound sends have a "nudge.sent" event and delivery outcomes correlate by provider_message_id or correlation_id.

Consent and Opt-Out Compliance Logging

- Given channel is SMS or email, When evaluating a send, Then the audit record includes consent_state (opted_in|opted_out|unknown), source, proof_reference, and evaluation timestamp. - When a STOP/UNSUBSCRIBE or unsubscribe link is used, Then an event "consent.revoked" is written within 2 seconds, the recipient is added to suppression, and subsequent send attempts are blocked and logged as "nudge.suppressed_consent" with reason. - When consent_state is opted_out or unknown, Then no "nudge.sent" event is produced; only suppression is logged for the attempted send.

Quiet Hours and Suppression Decision Logging

- Given quiet hours, throttling, or bundling policy applies, When a nudge would be emitted, Then an audit event records suppression_reason (quiet_hours|throttle|bundled), policy_id, policy_version, decision_timestamp, and next_eligible_time. - When bundling merges multiple nudges, Then an event "nudge.bundled" references constituent event_ids[] and the resulting bundle_event_id. - Then 100% of suppressed or bundled decisions are present in the audit within 200 ms of decision.

Retention and Redaction Policy Enforcement

- Given tenant retention is configured (e.g., audit_retention_days=365), When events age beyond TTL, Then a retention job permanently deletes or redacts fields per policy and writes an "audit.deleted" or "audit.redacted" event with policy_id and counts. - Then structural fields (event_id, event_type, claim_id, created_at) remain until TTL; PII-bearing fields (masked previews, contact identifiers) are redacted within 24 hours if pii_minimization=true. - Validation: with retention_days=1 on a test tenant, events older than 25 hours are not retrievable via UI or API; deletion logs show 100% of eligible events processed.

Export and API Access for Audits

- Given an admin selects a date range and filters (tenant_id, claim_id, event_type, channel), When requesting export, Then CSV and NDJSON files are generated within 60 seconds for up to 10,000 events and delivered via a signed URL expiring in 24 hours. - Then export files include schema_version, generated_at, and an HMAC-SHA256 checksum; checksum verification matches the file content. - Then the Audit API supports parameters (from, to, event_type, claim_id, channel, cursor, limit<=500), returns pages within 1 second for <=500 records, enforces 60 req/min rate limit, requires admin scope, and fields match export schema 1:1.

Capacity Sandbox

What‑if modeling to test staffing, shift changes, SLAs, and auto‑approval rules against predicted breach rates and backlog curves. Produces recommended hiring/OT windows and ROI estimates, shareable as scenario snapshots. Helps Strategists and Execs choose the cheapest, surest path to fewer breaches.

Requirements

Scenario Composer

"As an operations strategist, I want to compose multiple staffing and policy scenarios quickly so that I can test their impact without changing production settings."

Description

Configurable interface to define what‑if inputs for Capacity Sandbox, including staffing levels by role/skill, shift schedules (daily/weekly patterns, breaks, overtime), queue routing priorities, SLA targets by claim type/channel, and auto‑approval thresholds. Supports demand modifiers (seasonality, promo events, product launches), constraints (budget caps, hiring lead times, overtime policy), and reusable presets/templates. Validates inputs against business rules and highlights conflicts. Integrates with ClaimKit org data (roles, queues, existing SLAs) to prefill options and stores scenarios in a sandbox namespace isolated from production. Enables cloning, labeling, tagging, notes, and assumption fields for transparent scenario setup.

Acceptance Criteria

Org Data Prefill and Options Binding

Given the user’s org has roles, queues, and SLA policies configured in ClaimKit, When the Scenario Composer loads, Then the Staffing Roles, Queue, and SLA Target selectors are pre-populated with the org’s current items and reflect the latest values at time of load. Given the org has no configured item for a selector, When the composer loads, Then the selector shows an empty state with a “Create new” action and the scenario remains saveable. Given the composer has loaded options, When the user changes org data in another session, Then the current composer instance does not mutate and offers a “Refresh org data” action to pull the latest values. Given the composer loads successfully, When network is available, Then all org-driven selectors finish loading within 2 seconds and show a loading skeleton until ready.

Staffing and Shifts Configuration

Given a user adds staffing for a role/skill, When they set daily/weekly patterns, start/end times, and breaks, Then the composer validates that each shift: start < end in the local time zone, duration ≥ 1 hour, total breaks ≤ 30% of shift duration, and no negative values are accepted. Given overlapping shifts are defined for the same role, When schedules overlap on the same day, Then the composer allows overlaps but calculates and displays total concurrent headcount per 30-minute interval. Given overtime is enabled for a role, When weekly scheduled hours exceed the configured base hours, Then the composer marks excess as overtime and validates against the org overtime policy max hours per worker per week; Save is blocked if violated and an inline error explains the limit. Given any validation error exists, When the user attempts to save, Then the save is prevented and a summary of errors is shown, linking to the offending fields.

SLA Targets and Queue Routing Rules

Given claim types and channels are selected, When the user sets SLA targets (first-response and resolution) with units (minutes/hours/days) and calendar vs. business-hours mode, Then the inputs accept only positive integers and the mode per target is stored. Given queue routing priorities are defined, When the user assigns numeric priority weights per queue, Then each weight must be an integer 1–100, unique within the scenario for a given claim type, and the composer prevents circular dependency definitions. Given valid SLA and routing inputs, When the user saves the scenario, Then all values persist and re-load identically on reopen.

Demand Modifiers and Constraints

Given the user adds a seasonality or event modifier, When the user specifies start/end dates and a demand delta (percentage or absolute count), Then the composer validates date order, prevents overlaps that target the same claim type/channel without an explicit stacking choice (stack or replace), and displays the net combined effect preview per day. Given constraints are configured, When the user sets budget caps (currency), hiring lead time (days), and overtime policy toggles/limits, Then the composer validates non-negative values, correct currency format, and ensures hiring start dates in the scenario cannot precede today + lead time. Given any constraint conflicts with entered inputs, When detected, Then the conflicting fields are highlighted with warnings; errors block save while warnings allow save with a confirmation.

Auto-Approval Thresholds and Cross-Policy Validation

Given the user defines auto-approval thresholds by claim type and channel, When values are entered, Then only numeric ranges within 0 to the org’s policy maximum are accepted; values beyond the max are rejected with an inline error. Given a claim type requires mandatory manual review per org policy, When the user attempts to enable auto-approval for that claim type, Then the composer blocks the setting and explains the policy conflict. Given valid thresholds, When saving, Then the thresholds persist and are associated to the correct claim type/channel combination.

Scenario Persistence, Isolation, and Metadata

Given a new scenario is created, When the user saves, Then it is stored in the sandbox namespace, receives a unique scenario ID, and no production org settings or queues are modified. Given an existing scenario, When the user clones it, Then all inputs and metadata are duplicated, the label is suffixed with “Copy” (or incremented number if duplicates exist), and tags/notes/assumptions are carried over. Given labels, tags, notes, and assumptions fields, When the user edits them, Then labels must be 1–120 characters, tags up to 20 unique items (each 1–30 characters), and notes/assumptions up to 5000 characters; all are saved and retrievable on reopen.

Templates and Presets Management

Given a user saves the current inputs as a reusable preset/template, When they provide a unique name, Then the preset is versioned (v1, v2, …) and stored for the org. Given a preset is applied to a scenario, When the user selects it, Then the scenario inputs are replaced or merged according to user choice, and any invalid or missing fields are highlighted for completion before save. Given a preset is deleted, When deletion is confirmed, Then it is removed from the library without altering any scenarios that previously used it.

Baseline Data Sync & Assumptions Manager

"As a strategist, I want credible baselines and explicit assumptions so that scenario outputs are explainable and trusted by stakeholders."

Description

Data service that constructs a trustworthy baseline from ClaimKit history, including arrivals by queue/type/channel, handle time distributions, SLA classes, breach history, auto‑approval rates, and seasonality. Provides recency weighting, holiday/closure calendars, outlier trimming, and data freshness controls (e.g., last 30/90/180 days). Includes an assumptions manager for shrinkage, attrition, training ramp, hiring lead times, and AHT by skill tier, with manual overrides and saved assumption sets. Offers lineage/health checks (last sync, volume coverage) and snapshotting by as‑of date to freeze baselines for reproducible simulations; falls back to industry defaults when data is sparse.

Acceptance Criteria

Baseline Construction Completeness & Fallbacks

Given 180 days of ClaimKit history across queues/types/channels When the baseline sync runs Then the baseline includes per queue/type/channel: arrivals by hour/day, AHT distribution (mean, P50, P90), SLA classes/timers, historical breach rate, auto-approval rate, and weekday/month seasonality indices And at least 98% of eligible claims in the selected window are represented, with coverage% reported per metric And for any metric with fewer than 200 observations in the selected window, industry defaults are applied, the metric is flagged fallback=true, and default source/version are recorded in metadata And the sync completes in under 15 minutes for 500k claims and under 2 minutes for 50k claims And the baseline is stored with a version ID and as_of timestamp

Data Freshness Windows and Recency Weighting Controls

Given freshness window=90 days and recency weighting=off When the baseline is computed Then only records from the last 90 days are included and unweighted aggregates match a reference recomputation within 0.1% Given window=180 days and recency weighting=on with half_life=30 days When the baseline is recomputed Then the weight of an event 30 days older than the most recent equals 0.5±0.01 and weighted aggregates match a reference recomputation within 0.1% When the user switches the window from 90 to 30 days Then the baseline recalculates in under 30 seconds for <=100k claims and freshness metadata updates to reflect the new window

Holiday/Closure Calendar Application to Working-Time SLAs

Given a closure date configured (e.g., 2025-11-27) for Queue "Warranty" When SLA timers are computed for that queue Then non-working time on the closure date is excluded from business-hour SLA calculations and breach rates reflect working-time only And the applied calendar version ID is stored in baseline metadata Given multiple calendars scoped to different queues When the baseline is recomputed Then only the appropriate calendar is applied per queue, and a change in calendar produces a new baseline version

Outlier Trimming for AHT and Arrival Distributions

Given trimming method=percentile with bounds [2,98] When trimming is enabled Then events outside these percentiles are excluded from AHT distribution stats and output includes trimmed_count and trimmed_pct (default trimmed_pct <= 4%) Given trimming method=IQR with k=1.5 When trimming is enabled Then results match an independently computed IQR filter within 0.1% on sample aggregates When trimming is disabled Then distributions match the untrimmed aggregates within 0.1%

Data Lineage, Coverage, and Health Checks

Given a completed baseline sync When fetching baseline metadata via UI or API Then metadata includes last_sync_utc, source dataset IDs/versions, row counts per source, coverage_percent per metric, health_status, and error list (if any) And the metadata fetch responds in under 500 ms (p95) for a warm cache Given coverage_percent < 95% for any required metric or missing required fields When health checks run Then health_status=Fail, the baseline is not promotable for simulation use, and the UI presents a blocking alert with remediation hints

As-of Date Snapshotting and Reproducibility

Given as_of date D and settings S When a baseline snapshot is created Then the snapshot receives an immutable snapshot_id and checksum and is read-only thereafter When two simulations use the same snapshot_id Then their input baselines and summary aggregates are identical byte-for-byte When creating a new snapshot with a later as_of date D2 Then previously created snapshots remain unchanged, and attempts to edit any snapshot return 409 Conflict via API and are blocked in UI

Assumptions Manager: Overrides and Saved Sets

Given a user with edit permissions When creating or editing an assumption set Then the set supports fields: shrinkage% (0–60), attrition% per month (0–20), training_ramp_days (0–180) by role, hiring_lead_time_days (0–120), and AHT_by_skill_tier (30–3600 seconds), with validation errors for out-of-range values When saving an assumption set Then it persists with a unique name, version (semver), owner, timestamp, and can be marked default; cloning creates a new version with incremented patch When applying an assumption set to a scenario Then derived capacity inputs update within 2 seconds and an audit log records user, assumption_set_id/version, timestamp, and changed fields When importing an assumption set via JSON Then the payload is schema-validated; invalid imports are rejected with a field-level error list; valid imports create or update a set without altering existing snapshots

Demand Forecasting & Queue Simulator

"As a support leader, I want to simulate how staffing and routing changes affect breach rates so that I can choose configurations that minimize SLA violations."

Description

Hybrid forecasting engine that combines time‑series models for claim arrivals with stochastic queue simulation (multi‑skill, priority routing) to project backlog, wait times, and breach probability by SLA class under proposed scenarios. Ingests inputs from Scenario Composer and Baseline. Outputs include daily backlog curves, breach rates, service levels, throughput, and agent utilization per queue/product/channel. Supports case aging, preemption rules, and class of service. Non‑functional targets: MAE against backtests ≤10%, p50 runtime ≤30s for 12‑month horizon, support 10 concurrent runs. Integrates tightly with ClaimKit queue/case types and exposes a deterministic seed for reproducible results.

Acceptance Criteria

Backtest Forecast Accuracy 0% MAE

Given at least 18 months of cleaned historical claim arrivals per queue/product/channel When the engine runs a rolling-origin backtest with daily granularity and a 28-day forecast horizon across the last 6 months (>=6 folds) Then the volume-weighted MAE of daily arrivals across all queues/products/channels is <= 10% And each of the top-5 volume queues individually has MAE <= 10% And the run metadata returns MAE per series, fold count, and weighting used

12-Month Horizon p50 Runtime 30s

Given a scenario requesting a 12-month forecast horizon with default replication count and standard production environment settings When the simulation and forecasting pipeline is executed 30 times under nominal load Then the total wall-clock runtime (submit to completed) has p50 <= 30 seconds And the run metadata includes start/end timestamps and total runtime for each run

Support 10 Concurrent Simulation Runs

Given 10 distinct scenarios submitted concurrently via API When the system executes all runs in parallel Then all 10 runs complete without error (no 5xx or timeouts) And each run produces an isolated result set and metadata without cross-run contamination And no run is rejected due to internal concurrency limits

Output Completeness and Conservation Checks

Given a valid scenario with baseline and composer inputs When the run completes Then outputs include for each day and per queue/product/channel: backlog curve, breach probability by SLA class, service level, throughput, and agent utilization And for each day: prev_backlog + arrivals - completions = next_backlog holds within case And the counted number of breached cases equals the number reported in breach metrics for the same period

Queue Mechanics: Multi-Skill, Priority, Preemption, and Case Aging

Given a synthetic scenario with two skills (A,B), two classes of service (P1>P2), preemption enabled, and an aging threshold of 8 hours When the simulation is executed Then agents with skill A do not serve cases requiring only skill B, and vice versa And P1 cases are always prioritized over P2 in routing And when a P1 arrives while all agents are serving P2, the next assignment honors P1 before any new P2 And any P2 case aged beyond 8 hours is scheduled with P1 priority from that point forward

Deterministic Seed Reproducibility

Given a fixed random seed S and identical inputs When the simulation is run twice Then all time series outputs and summary metrics are byte-for-byte identical across runs And the run metadata records the seed value S used for reproducibility

Integration with Scenario Composer, Baseline, and ClaimKit Types

Given inputs produced by Scenario Composer and Baseline that conform to the published schema When the engine validates and loads inputs Then staffing, shift calendars, SLA targets, auto-approval rules, and arrival forecasts are applied to the simulation And queue/case type mappings align with ClaimKit definitions so that all outputs are segmented by queue/product/channel accordingly And invalid or missing fields produce machine-readable validation errors identifying the offending path and rule

Cost & ROI Optimizer

"As an executive, I want recommended staffing and policy changes with clear ROI so that I can allocate budget confidently."

Description

Optimization module that recommends hiring windows, overtime bands, and auto‑approval thresholds to achieve a target breach rate at minimum cost. Inputs include labor rates by role/region, overtime premiums, contractor rates, training ramp curves, budget constraints, and expected warranty exposure from auto‑approvals. Produces prescriptive outputs: schedule deltas by week, cost vs. service level trade‑off curves, payback periods, and ROI summaries with sensitivity analysis. Supports hard/soft constraints and scenario comparison. Integrates with HRIS/payroll rate tables where available or static admin uploads. Provides a transparent rationale and constraint binding report for each recommendation.

Acceptance Criteria

Optimize to Target Breach at Minimum Cost

Given a target breach rate (e.g., 5%), weekly demand forecast, labor rates, overtime caps, min/max headcount per role, and a weekly budget cap When the optimizer is executed for a 26-week horizon Then it returns weekly schedule deltas (hires, overtime hours, contractor hours) that simulate to a breach rate <= target in at least 95% of Monte Carlo runs and minimize total cost within a 1% optimality gap And the solver status is "Optimal" or "FeasibleWithinGap" and the reported optimality gap <= 1% And no hard constraints are violated (0 violations) and all constraint bounds are respected And the output includes total cost, expected breach distribution, success probability, and an ROI summary with payback period (weeks) and NPV at the provided discount rate

Cost vs Service-Level Trade-off Curve Generation

Given labor inputs by role/region, overtime premiums, training ramp curves, and SLA targets When generating the cost vs service-level frontier Then the system outputs at least 12 Pareto-efficient points spanning from baseline breach to at least 50% improvement And for any adjacent points sorted by decreasing breach rate, total cost is non-decreasing And each point includes: total cost, expected breach rate, hires, overtime hours, contractor hours And the recommended operating point is identified as the minimum-cost point meeting the target breach (if provided) or the knee point (if no target) And the frontier is exportable as CSV and image

Auto-Approval Threshold Optimization with Risk Costing

Given an expected warranty exposure curve as a function of auto-approval threshold and an optional max exposure cap When the optimizer runs with threshold as a decision variable Then it outputs a threshold value and includes expected exposure cost in the total objective And if a max exposure cap is set as hard, the selected threshold satisfies exposure <= cap; if soft, any exceedance is reported with slack and penalty cost And the ROI summary quantifies manual review savings vs incremental exposure cost And the rationale lists the marginal cost/benefit at the chosen threshold

HRIS/Payroll Rate Table Integration and Fallback

Given a connected HRIS integration and an approved static rate table upload exists When an optimization run is triggered Then the system fetches role-region base rates and overtime multipliers effective over the scenario window from HRIS; on API failure it falls back to the latest approved static upload And the run metadata records the rate source ("HRIS" or "Static"), effective date ranges, and version id And if any required rate is missing or older than 30 days with no HRIS data available, the run is blocked with an error listing missing roles/regions And the outputs display which rate source was used for each role/region

Transparent Rationale and Constraint-Binding Report

Given a completed recommendation When the user opens the rationale report Then the report itemizes objective components (base labor, overtime premium, contractor, exposure penalties) with amounts that sum to total cost within rounding tolerance And enumerates all constraints with binding status and dual/shadow price where available And provides the minimal conflict set with suggested relaxations if the problem was infeasible And includes sensitivity for +/-10% changes in demand, labor rates, and exposure cost with impact on total cost and breach rate And provides a reproducibility hash, solver seed, and input snapshot id; re-running with the same snapshot yields identical outputs within 0.5% tolerance

Scenario Comparison and Shareable Snapshot

Given two or more saved scenarios When the user compares them Then the system shows delta tables by week for hires, overtime hours, contractor hours, total cost, and expected breach rate And identifies dominated scenarios (higher cost and higher breach) and flags them as dominated And generates an immutable, shareable snapshot link containing inputs, outputs, and rationale, with RBAC enforcing org-only access And a reproducibility check confirms rerun results within 0.5% tolerance And the snapshot is timestamped and versioned

Hard vs Soft Constraint Handling and Infeasibility Reporting

Given hard constraints (budget cap, overtime max, headcount bands) and soft constraints (preferred overtime band, hiring freeze) When the problem is feasible Then no hard constraints are violated; any soft constraint violation is reported with slack amounts and penalty cost in the objective When the problem is infeasible under hard constraints Then the run fails fast, reports infeasibility with the minimal conflict set and smallest relaxations needed to regain feasibility, and provides a one-click relax-and-rerun option And all constraint statuses are included in the report and exportable as CSV

SLA & Auto‑Approval Rule Impact Modeler

"As a compliance‑aware strategist, I want to model new SLAs and auto‑approval rules so that I can balance customer experience, risk, and cost."

Description

Dedicated workspace to draft or tweak SLA policies and auto‑approval rules (by product, channel, claim value, serial age, warranty tier) and simulate downstream effects on capacity, breach risk, and cost before deployment. Validates rule syntax and checks for conflicts with compliance/business constraints. Imports current ClaimKit policies for baseline comparison and keeps what‑if rule sets isolated from production. Presents trade‑off analytics (e.g., higher auto‑approval reduces handling time but increases exposure) with guardrail thresholds configurable by compliance.

Acceptance Criteria

Draft SLA Simulation Outputs

Given I am in Capacity Sandbox with baseline policies loaded When I create a draft SLA with a 24-hour response target for Channel = Email and Product = Appliances and click Run Simulation for the next 12 weeks Then the model displays predicted breach rate (%), backlog curve by week, and handling cost projections within 10 seconds And the dashboard shows deltas versus baseline for each metric

Auto-Approval Rule Syntax Validation

Given I open the Auto-Approval Rule Editor When I enter a rule with invalid syntax (e.g., missing closing parenthesis) and attempt to save Then Save Draft is disabled and an inline error is shown indicating the first invalid token position and a corrective example And when I correct the syntax, Save Draft becomes enabled and validation passes with no errors

Compliance Guardrail Enforcement

Given compliance guardrails are configured: Max auto-approval exposure <= $10,000/day (hard) and Serial age <= 5 years (soft) When my draft rules would exceed either guardrail and I run validation Then for the hard guardrail the system blocks simulation and lists each violation with the rule ID and offending parameter And for the soft guardrail the system allows simulation but requires a justification note of at least 20 characters before saving

Conflict Detection With Existing Policies

Given baseline includes an Email channel SLA of 48 hours When I propose a 24-hour SLA for Email and a 36-hour SLA scoped to Warranty Tier = Gold Then the engine detects overlapping scopes, flags a conflict, and lists the conflicting rules and their precedence And after I set precedence to "Warranty Tier overrides Channel" and re-validate, no conflicts are reported

Baseline Import And Comparison

Given I click Import Current ClaimKit Policies When the baseline import completes Then a read-only Baseline rule set appears with timestamp and version ID And toggling Compare shows side-by-side rule differences and metric deltas (% and absolute) against my draft

Isolation From Production

Given I have saved a draft rule set in the sandbox When a new live claim arrives in production Then the draft rules are not applied to the claim and only published production policies are evaluated And the production policy audit log shows no reference to sandbox drafts

Trade-Off Analytics And ROI

Given I change an auto-approval rule to auto-approve 80% of claims with Claim Value <= $150 When I run the simulation Then the analytics panel shows updated average handling time change (minutes), exposure delta ($), predicted breach rate change (pp), and estimated ROI relative to baseline And all metrics update within 10 seconds of the change

Visual Dashboards & Scenario Comparison

"As a stakeholder, I want intuitive visuals that compare scenarios so that we can align quickly on the best plan."

Description

Interactive visualization suite to explore results: backlog curves with baseline overlays, breach heatmaps, utilization histograms, cost‑vs‑service frontiers, and confidence bands. Enables side‑by‑side comparison of up to five scenarios, drill‑downs by queue/product/channel/SLA class, and annotations that surface key drivers and assumptions. Exports to PNG/PDF/CSV with data dictionaries. Performance targets: initial load under 2s on typical datasets, interaction latency under 150ms, and accessible color palettes with keyboard navigation.

Acceptance Criteria

Visualization Coverage & Accuracy Across Chart Types

Given a typical dataset is loaded, when the user opens the dashboards, then the following visualizations render without errors: backlog curves, breach heatmaps, utilization histograms, and cost-vs-service frontier. Given backlog curves are displayed, when the baseline overlay toggle is on by default, then the baseline series appears and can be toggled off/on, and values match backend calculations within ±0.5% for sampled timestamps. Given confidence bands are enabled, when the user toggles bands on, then the median line and 5th–95th percentile shading appear on applicable charts and match backend percentiles within ±0.5% for sampled points. Given a breach heatmap is shown, when a cell is hovered, then the tooltip shows date/period, SLA class, breach %, and sample size; breach % equals backend value within ±0.5 percentage points. Given a utilization histogram is shown, when bins are calculated, then the sum of bin counts equals total observation count and bin boundaries are labeled. Given the cost-vs-service frontier is shown, when the user hovers points, then cost and breach rate readouts match backend values within ±0.5% for sampled points.

Side-by-Side Scenario Comparison (Max Five)

Given multiple scenarios exist, when the user selects scenarios for comparison, then up to five scenarios can be selected and a sixth selection is prevented with a message indicating 'Maximum 5 scenarios'. Given scenarios A–E are selected, when any visualization renders, then each scenario is represented with a unique legend label and distinct pattern/color mapping that is consistent across all charts. Given multiple scenarios are displayed, when the user switches between chart types, then axes (time and value scales) remain synchronized and aligned across scenarios. Given a user hovers a time point on backlog curves, when crosshair sync is enabled, then all visible scenarios highlight the corresponding point and show aligned tooltips.

Drill-Down by Queue/Product/Channel/SLA Class

Given no filters are applied, when the user opens the filter panel, then 'All' is selected for Queue, Product, Channel, and SLA Class by default. Given the user applies multi-select filters on any dimension, when the filters are applied, then all charts and KPI tiles update to reflect only the filtered subset and counts/metrics match backend filtered results for sampled queries. Given filters are active, when the user navigates between dashboard tabs/visualizations, then the active filters persist until cleared. Given filters are active, when the user clicks 'Clear All', then all filters reset to 'All' and visuals revert to the unfiltered state.

Annotations Authoring and Display

Given a chart is visible, when the user adds an annotation with title and description at a specific time/point, then the annotation is saved with author, timestamp, and anchor reference and appears on the chart at the correct location. Given annotations exist, when the user pans/zooms or switches chart types showing the same series, then annotations remain correctly anchored to the underlying data/time and are visible when within the viewport. Given an annotation exists, when the user edits or deletes it, then changes persist and the display updates immediately across all relevant charts for that scenario. Given annotations are present, when the user toggles 'Show annotations', then all annotations show/hide accordingly.

Export to PNG/PDF/CSV with Data Dictionary and Metadata

Given a dashboard view is active with applied filters and annotations, when the user exports to PNG or PDF, then the exported file includes the visible chart(s), legend, active filters, scenario names, annotations, export timestamp (UTC), and page title. Given a dashboard view is active, when the user exports data to CSV, then the CSV contains only the series currently displayed (respecting filters and selected scenarios) with columns for timestamp, scenario_id/name, metric name, value, units, and timezone. Given a CSV export is generated, when the file is downloaded, then a companion DataDictionary CSV is provided listing each column_name, data_type, unit, and description; definitions match the product's data dictionary. Given an export is requested on a typical dataset, when generation starts, then the download begins within 300 ms and completes within 3 s.

Performance and Responsiveness

Given a typical dataset, when the dashboard first loads, then initial render completes within 2 s at or below the 75th percentile over 20 runs in staging. Given any user interaction (hover, filter change, series toggle, zoom/pan), when it is performed, then time to first visual update is under 150 ms at or below the 75th percentile. Given long-running computations are required, when they execute, then UI remains responsive with no main-thread blocks >50 ms and a non-blocking loading indicator appears if work exceeds 400 ms.

Keyboard Navigation and Accessibility

Given the dashboard is focused, when the user navigates via Tab/Shift+Tab, then all interactive elements (filters, legend items, toggles, export buttons, scenario selectors) are reachable in a logical order and have visible focus states. Given a focused control, when the user presses Space/Enter, then the control activates (e.g., toggle overlays, select legend items, open filter menus) and changes are reflected in the charts. Given a chart area is focused, when the user uses keyboard controls (e.g., arrow keys for crosshair, +/- for zoom), then equivalent interactions occur without a mouse and tooltips are accessible via focus. Given charts use color to distinguish scenarios, when rendered, then color choices meet WCAG 2.1 AA contrast requirements against background and non-color encodings (patterns/markers) are provided to differentiate series for color-blind users. Given a user requests an accessible view, when 'View data as table' is activated, then an ARIA-labeled data table of the current chart is presented for screen readers with the same data shown in the visualization.

Snapshot Versioning & Sharing

"As an executive, I want to share and revisit scenario snapshots so that decisions are documented and repeatable."

Description

Scenario snapshot system that saves full inputs, assumptions, model version, outputs, and decision notes as immutable versions with timestamps. Supports links with role‑based permissions (view/comment), team workspaces, and exportable bundles for offline review. Includes audit trails (who created/modified, when), diffing between versions, and governance labels (draft/proposed/approved/archived). Integrates with ClaimKit SSO/roles and ensures snapshots do not alter production settings.

Acceptance Criteria

Create Immutable Snapshot With Full Metadata

Given a completed Capacity Sandbox model run with defined inputs and assumptions When the user selects "Save Snapshot" and enters decision notes Then the system persists an immutable snapshot containing all inputs, assumptions, computed outputs, decision notes, model version identifier and checksum, a unique snapshot ID, and an ISO 8601 UTC timestamp And the snapshot is read-only; any change requires creating a new version And saving the snapshot does not modify production settings or live model parameters

Role-Based Link Sharing (View/Comment)

Given an existing snapshot and the sharer has permission to share When the user generates a share link and assigns Viewer role Then recipients authenticated via ClaimKit SSO can view all snapshot contents but cannot edit, delete, relabel, comment, or change permissions And access by unauthorized or unauthenticated users is denied When the user assigns Commenter role Then recipients authenticated via ClaimKit SSO can add and edit their own comments but cannot change snapshot data, labels, or permissions

Team Workspace Access Control

Given team workspaces configured in ClaimKit When a snapshot is saved to Workspace A Then members of Workspace A with read access can see the snapshot in the workspace list, and non-members cannot access it And the snapshot inherits workspace-level permissions without elevating a user's rights beyond their ClaimKit role And removing a user from the workspace immediately revokes their access to the snapshot

Governance Labels Lifecycle

Given a snapshot labeled Draft When a user with governance permission changes the label to Proposed or Approved or Archived Then the transition is validated against allowed states {Draft -> Proposed -> Approved -> Archived} and recorded in the audit trail with who and when And only users with governance permission can set Approved or Archived And the current label is visibly displayed wherever the snapshot appears

Audit Trail Integrity

Given audit logging is enabled When a snapshot is created, shared, relabeled, permissions changed, or commented on Then the audit trail records actor user ID, action type, target snapshot ID, timestamp (UTC), and before/after values where applicable And snapshot content (inputs, assumptions, outputs) remains unchanged in all audit-recorded events

Version Diffing Between Snapshots

Given two snapshots are selected for comparison When the diff view is opened Then differences in inputs, assumptions, model version, outputs, decision notes, and governance label are displayed with before/after values And a summary count of changes by section is shown And no changes are written to either snapshot during diff

Exportable Offline Bundle

Given an existing snapshot When the user exports an offline bundle Then the system generates a downloadable package containing snapshot inputs, assumptions, model version, outputs, decision notes, comments, and audit metadata in machine-readable and human-readable formats And the export can be opened offline without authentication and matches the snapshot ID and timestamp And exporting does not change the snapshot or its permissions

FitCheck

Eliminates wrong‑part orders by verifying compatibility against model/serial, symptom codes, and OEM supersessions. Auto-suggests approved substitutes with a confidence score and notes any install nuances. Benefit: fewer return trips and RMAs for Field Fixers, faster first‑time fixes for Agents, and lower parts waste for Ops.

Requirements

Model/Serial Normalization & OEM Supersession Graph

"As an Agent, I want model and serial info to be automatically validated and mapped to the correct product lineage so that FitCheck can reliably determine part compatibility."

Description

Ingest model and serial numbers from ClaimKit cases and emails, normalize them using OEM-specific parsing rules, and validate against a unified product catalog. Maintain a directed acyclic graph of part supersessions and equivalence sets per OEM, honoring serial-range, region, and revision constraints. Expose a low-latency service that resolves current valid part identities, tracks provenance, and reconciles duplicate inputs from the magic inbox. Targets: p95 lookup ≤150ms, idempotent updates, and versioned change history to ensure accurate compatibility checks and traceability.

Acceptance Criteria

OEM-Specific Model/Serial Normalization

Given raw model and serial values from cases/emails across at least 5 OEMs with known formatting quirks, When normalization runs using OEM-specific parsing rules, Then the outputs are standardized strings (uppercase, trimmed, OEM rule–defined separators) with extracted tokens (model core, revision, region, serial) and a rule-id:version recorded for each transformation. Given the sample input: OEM=Acme, model="A-1000B Rev.2 EU", serial="SN 123 456 789", When normalized, Then model="A1000B-REV2", region="EU", serial="123456789" and the normalization record stores rule-id "acme-model-v2" and "acme-serial-v1". Given an invalid serial format per OEM rules, When normalization runs, Then the output is rejected with error code NORMALIZATION_INVALID and a human-readable reason including the failed rule-id. Given a curated test corpus of 10,000 known-valid examples, When normalization is executed, Then ≥99.5% normalize successfully and ≤0.1% are misparsed (measured by exact match against ground truth).

Unified Catalog Validation and Error Codes

Given a normalized model and serial, When validated against the unified product catalog, Then a single productId is returned with OEM, modelId, and serial-range bounds that include the provided serial. Given a normalized model that does not exist, When validated, Then response code is NOT_FOUND with fields {oem, normalizedModel, attemptedCatalogs}. Given multiple catalog candidates share the same model but only one includes the provided serial-range, When validated, Then the candidate whose range contains the serial is returned; otherwise response code is AMBIGUOUS with candidateIds[]. Given catalog connectivity issues, When validation is attempted, Then response code is TRANSIENT_ERROR and no partial links are stored.

Supersession Resolution with Serial/Region/Revision Constraints

Given a partId P1 with supersession edges P1->P2->P3 and request context {serial=S7500, region=US, revision=B}, When resolveCurrentPart is called, Then the returned currentPartId is P3 if and only if all constraint predicates on the path are satisfied for {S7500, US, B}; otherwise the latest constraint-satisfying node is returned, or NOT_COMPATIBLE if none. Given a supersession edge P1->P2 with constraint serial>=S5000 and request serial=S4000, When resolveCurrentPart is called, Then P1 is returned and response includes reason "constraint_not_met:P1->P2:serial". Given OEM-defined equivalence set E={Q1,Q2,Q3} for P3, When resolveCurrentPart returns P3, Then equivalenceSetIds includes Q1,Q2,Q3 with equivalenceType and OEM scope. Given no supersession path exists for P1, When resolved, Then P1 is returned with pathLength=0.

Low-Latency Lookup Service Performance

Given a warmed service instance and a dataset ≥5M parts and ≥20M supersession edges, When executing 10,000 resolve requests with concurrency=32 over 10 minutes, Then p95 end-to-end latency ≤150ms and timeouts=0. Given cold start conditions, When the first 200 requests are executed, Then subsequent steady-state window (requests 201–10,000) still meets p95 ≤150ms. Given valid requests, When executed, Then ≥99.95% return HTTP 2xx with a valid payload schema; ≤0.05% may be 4xx (client errors) and 0% 5xx due to server faults in the test window.

Idempotent Ingestion and Duplicate Reconciliation

Given two identical payloads from the magic inbox with the same sourceMessageId and identical normalized content hash, When ingested within a 30-day window, Then only one mutation is applied and the second returns 200 with {idempotent:true, versionUnchanged:true}. Given the same logical update retried with different delivery ids but same sourceMessageId, When processed, Then the resulting graph/version is identical (checksum match) and exactly one audit entry is created. Given near-duplicate emails that normalize to identical model/serial and part refs, When ingested, Then they are linked to the same case and no duplicate nodes/edges are created (node count and edge count unchanged).

Versioned Change History and Time-Travel Traceability

Given a sequence of updates producing versions v100…v110, When resolve is called with asOfVersion=v104 (or asOfTime=timestamp_v104), Then the returned currentPartId and path reflect the catalog/supersession state at v104 exactly (snapshot hash equals stored v104 hash). Given any mutation is applied, When queried, Then the change history includes {versionId, timestamp, actor/system, sourceMessageId, transformationRuleIds[], before/after diffs} and is immutable. Given an audit request for a resolution, When executed, Then the response includes provenance fields linking the output to specific catalog rows and supersession edges by version ids.

Supersession Graph Acyclicity Enforcement and Safe Rollback

Given an import batch whose edges would introduce a cycle (e.g., P1->P2, P2->P3, P3->P1), When applied, Then the entire batch is rejected atomically, no edges are committed, and error code CYCLE_DETECTED is returned with the minimal cycle path. Given concurrent imports on disjoint subgraphs, When processed, Then no cycles are introduced and commit order is serialized per subgraph; global invariant DAG=true holds (cycle count=0 from periodic validator). Given a partial failure during edge creation, When transaction ends, Then graph state is rolled back to the previous version and a compensating audit entry is recorded with rollbackVersionId.

Symptom-to-Parts Mapping Service

"As a Field Fixer, I want my reported symptoms to translate into likely parts for the specific model so that I can order the right part on the first visit."

Description

Normalize free-text and coded symptoms into a canonical taxonomy and map them to candidate components and specific parts by model family. Incorporate OEM service literature, historical fixes, and observed failure rates to produce probability-weighted suggestions. Provide a versioned API that returns ranked candidates with confidence values and supports incremental updates, rollbacks, and multi-language inputs to drive accurate part selection from diverse intake channels.

Acceptance Criteria

Normalize Free-Text and Coded Symptoms

Given a labeled corpus of 1,000 mixed free-text and coded symptom entries across EN, ES, and FR and a valid modelFamilyId When POST /v{n}/symptoms/normalize is called for each entry Then the service returns canonicalSymptomCode and canonicalSymptomLabel for at least 95% of entries And macro-F1 >= 0.90 against the gold canonical codes And each response echoes originalText, detectedLanguage, and normalizedText And coded inputs (e.g., E24) map to the same canonicalSymptomCode as their free-text equivalents

Rank Candidate Components and Parts by Model Family

Given one or more canonicalSymptomCodes and a modelFamilyId When GET /v{n}/parts/suggest?topK=10 is called Then the response contains 1–10 candidates each with componentId, partNumber, oem, and confidence in [0,1] And candidates are strictly sorted by confidence descending And default topK is 5 when not specified And sum(confidence) of returned candidates >= 0.80 when at least one candidate exists And each candidate is valid for the given modelFamilyId

Evidence-weighted Suggestions with Attributions

Given the request sets explain=true When GET /v{n}/parts/suggest is called Then each candidate includes evidenceSources with keys oemDocs, historicalFixRate, and observedFailureRate And each evidence item includes a sourceId or documentRef and a weight in [0,1] And across a 200-request QA set, at least 95% of candidates include non-empty evidenceSources And per-candidate confidence is present and >= 0.01

Supersession and Substitute Resolution

Given a candidate whose OEM part has a supersession chain When suggestions are generated Then the latest active partNumber is returned with supersessionChain listed oldest→newest And deprecated partNumbers are not returned as primary suggestions And if an approved substitute exists, a substitute object is included with substitutePartNumber, confidence, and installNotes And all returned partNumbers are unique within the response

Versioned API with Determinism, Incremental Updates, and Rollback

Given a request with header X-Model-Version set to a valid versionId When the same request is repeated 100 times Then responses are byte-for-byte identical excluding responseId and timestamps And the response includes apiVersion and modelVersion fields Given an incremental update package is applied producing modelVersion V2 When GET /v{n}/admin/versions is called Then V2 is listed with state=active and V1 with state=deprecated And specifying modelVersion=V1 yields ranked results identical to pre-update snapshots for a 500-request regression set Given a rollback is initiated to V1 When rollback completes Then new requests with modelVersion omitted use V1 And the time from rollback start to active version switch is <= 5 minutes

Multi-language Input Support

Given inputs in EN, ES, FR, and DE with diacritics and common domain slang When POST /v{n}/symptoms/normalize is called without a language parameter Then detectedLanguage is correct for >= 97% of entries on a 500-sample test set And per-language canonicalSymptomCode mapping macro-F1 is >= 0.88 And non-Latin inputs (e.g., JA kana) are rejected with 422 and error.code=UNSUPPORTED_LANGUAGE until that locale is enabled

Performance, Rate Limiting, and Error Handling Under Load

Given a steady load of 100 RPS per region with typical payloads When calling /symptoms/normalize and /parts/suggest Then p95 latency <= 250 ms for normalization and <= 500 ms for suggestion, and error rate <= 0.5% And 99.9% of requests succeed over a 30-day window excluding client 4xx Given a client exceeds its rate limit When additional requests arrive Then the service returns 429 with a Retry-After header and no partial results Given an invalid or missing modelFamilyId When /parts/suggest is called Then the service returns 400 with error.code=INVALID_MODEL_FAMILY and no candidates

Compatibility Scoring Engine with Explainability

"As an Ops Lead, I want a clear compatibility score with reasons so that I can set policies to prevent wrong-part orders."

Description

Compute a compatibility score for requested parts by combining normalized model/serial validation, supersession resolution, and symptom-based likelihoods. Enforce hard-fit rules (dimensions, connectors, voltage, serial-range) and soft evidence (historical success) to determine pass/warn/block outcomes. Return transparent reason codes and human-readable explanations, with configurable thresholds by account. Provide a stateless API and SDK with p95 response ≤200ms and structured logs for offline analysis and tuning.

Acceptance Criteria

Unified Compatibility Score and Decision Outcome

Given a request with account_id, model, serial, part_id, and optional symptom_codes When the engine evaluates the request Then it returns compatibility_score in the range 0–100 and decision in {pass, warn, block} Given configured thresholds T_pass and T_block for the account When compatibility_score ≥ T_pass Then decision = pass Given configured thresholds T_pass and T_block for the account When T_block ≤ compatibility_score < T_pass Then decision = warn Given configured thresholds T_pass and T_block for the account When compatibility_score < T_block Then decision = block Given inputs containing formatting noise (case, whitespace, dashes) When model/serial are processed Then they are normalized prior to validation and scoring

Explainability: Reason Codes and Human-Readable Rationale

Given any evaluation result When the response is returned Then it includes reason_codes[] where each item has code, type {hard|soft}, and contribution [-100..+100] Given decision ∈ {warn, block} When the response is returned Then it includes at least one human_readable_explanation (≤240 chars) per primary reason Given decision = pass When the response is returned Then it includes the top 3 positive drivers and any cautions with human_readable_explanation Given the response payload When validated against the public schema Then it conforms to a versioned contract including fields: decision, compatibility_score, thresholds_used, reason_codes[], explanations[]

Account-Level Threshold Configuration

Given an account with custom T_pass and T_block thresholds When a request includes that account_id Then the engine uses the account’s thresholds; otherwise defaults are applied Given updated threshold values are saved in the configuration store When the update is committed Then new evaluations reflect the change within 60 seconds Given invalid configuration (e.g., T_pass ≤ T_block) When a change is submitted Then it is rejected and the last known good configuration is retained and used for evaluations Given a rejected configuration change When an evaluation occurs Then the response includes a config_error reason_code and the thresholds_used reflect the last known good values

API/SDK Performance and Statelessness

Given the scoring API at /v1/fitcheck/score When subjected to a steady load test of ≥10,000 requests over 15 minutes Then p95 service latency ≤200ms, p99 ≤400ms, and error rate <0.1% (measured at service boundary, excluding network) Given two identical requests with the same inputs and account_id When sent in any order or concurrently Then responses are identical in score, decision, and reasons Given official SDKs for Node, Python, and Java When used without prior session initialization Then they can perform a full request/response cycle without retaining mutable global state Given a downstream enrichment timeout When an evaluation occurs Then the API still responds within SLA and includes degraded_reason codes per impacted components as configured (fail-open or fail-closed)

Structured Decision Logging

Given any completed evaluation When emitting logs Then a structured log record is produced containing: timestamp, request_id, account_id, part_id, model, serial_hash, symptom_codes, evidence_components[], final_score, decision, thresholds_used, reason_codes[], latency_ms, service_version, config_version, supersession_chain, decision_id Given log delivery to the configured sink When processing 1,000,000 evaluations in a 24-hour period Then ≥99.9% of log records are delivered within 5 minutes of evaluation completion Given privacy requirements When logging identifiers Then raw serial numbers and PII are not persisted; only hashed or masked forms are stored

Supersession and Substitute Handling

Given an OEM supersession chain exists for the requested part When the requested part is obsolete Then the engine resolves to the latest valid part and includes supersession_chain in the response Given approved substitute parts with confidence scores and install_notes When a substitute’s compatibility exceeds the requested part’s score Then the response includes up to the top 3 substitutes with id, confidence_score, rationale reason_codes, and install_notes; all substitutes must satisfy hard-fit constraints Given a circular or ambiguous supersession graph When detected during resolution Then the engine returns decision = block with reason_code = SUPERSESSION_CONFLICT and no substitute recommendations

Hard-Fit Constraint Enforcement

Given any hard-fit constraint violation (dimensions, connector type, voltage, serial-range exclusion) When evaluating compatibility Then decision = block and each violated constraint appears as a blocking reason_code Given unknown or missing hard-fit data for a required field When soft evidence suggests pass Then decision is downgraded to warn with reason_code = HARD_FIT_DATA_GAP Given all hard-fit constraints match When evaluating compatibility Then hard-fit contributes positively to the score; decision is still determined by thresholds, and no soft evidence can override a hard-fit block

Approved Substitute Recommendation with Install Notes

"As an Agent, I want FitCheck to suggest approved substitutes with any special install notes so that I can proceed without delaying the repair."

Description

When the requested part is incompatible or low-confidence, suggest pre-approved substitutes drawn from supersession graphs, cross-OEM equivalents, and house-brand catalogs. Include confidence scores, cost/lead-time deltas (when available), and any install nuances such as adapters, wiring changes, firmware steps, or calibration procedures. Respect OEM constraints and account-level policies, and enable one-click application of the substitute to the order with notes attached to the ticket.

Acceptance Criteria

Incompatible Part: Substitute Suggestions Presented

Given a claim ticket with requested part P and model/serial M/S that fails the compatibility check When the system evaluates substitutes for P Then it must display 1–10 pre-approved substitutes, if available, sourced from supersession graphs, cross-OEM equivalents, and house-brand catalogs And each suggestion must show: source_type, confidence_score (0–100%), and a one-line rationale And suggestions must be sorted by descending confidence_score And the suggestion panel must render within 2 seconds at p95 on production-like data

Low-Confidence Match: Threshold Triggers Recommendations

Given the requested part P yields a fit confidence below the account policy threshold T (default 80%) When the agent views the part fit panel Then the system must display substitute recommendations for P And the original part remains selectable but is labeled "Low confidence" with its numeric score and a tooltip explaining the risk And the applied threshold T is read from account policy and captured in telemetry for the event

Policy and OEM Constraint Compliance in Suggestions

Given account-level policies and OEM constraints (e.g., cross-OEM disallowed, region-locked SKUs, warranty-only parts) apply to the ticket When generating the substitute list Then any substitute violating a policy or OEM constraint must be excluded from the displayed list And the UI shows a non-blocking note "X suggestions hidden by policy" with a link to policy details for authorized users And an audit entry records the suppressed substitutes count and reasons

Install Notes Display and Attachment

Given a substitute S includes install nuances (adapters, wiring changes, firmware steps, calibration) When the suggestion card is rendered Then it lists install notes with standardized tags: adapters, wiring, firmware, calibration And each note includes actionable details (required adapter SKUs, wiring diagram link or color map, firmware file reference and version, calibration value/units) And when S is applied to the order, these notes are attached to the ticket and included in work order print/export views And the agent must acknowledge the notes before finalizing the order update

One-Click Apply Substitute Updates Order and Audit

Given the agent clicks "Apply substitute" on suggestion S When the operation completes Then the order updates the part line to S, recalculates total cost and estimated ship/arrival date, and attaches install notes to the ticket And the ticket activity log records user, timestamp, original part -> S mapping, confidence_score, and any policy references And the action is idempotent and offers Undo for up to 5 minutes or until the order is submitted, whichever comes first And on failure, no partial changes persist and a descriptive error with retry option is shown

Cost and Lead-Time Deltas and Freshness

Given cost and lead-time data exist for both the requested part P and substitute S When the suggestion card is shown Then it displays delta_cost and delta_lead_time versus P with sign (+/−) and units And if any metric is unavailable, display "N/A" with tooltip "Supplier data unavailable" and do not block selection And if any metric is older than 24 hours, label it "Stale" and show last-updated timestamp

FitCheck Inline Decision UI (Case & Order Flows)

"As an Agent, I want an inline FitCheck panel in my workflow so that I can make informed part decisions without context switching."

Description

Embed an interactive FitCheck panel within ClaimKit’s case and ordering workflows that displays the verdict, score, explanations, and recommended substitutes. Support inline actions to accept suggestions, view/install notes, edit model/serial, and request overrides. Update results in real time as inputs change, meet accessibility standards, and provide event hooks to trigger or pause SLA timers based on decision outcomes to keep agents in flow and reduce context switching.

Acceptance Criteria

Inline FitCheck Panel Rendering in Case and Order Workflows

Given an agent opens a Case Detail or Order Creation view with model, serial, and symptom populated When the page loads Then the FitCheck panel is visible within the primary workflow layout without opening a new page And the panel displays a verdict (Compatible | Incompatible | Unknown) And the panel displays a confidence score as a percentage 0–100% And the panel displays up to 3 explanation bullets for the decision And the panel lists up to 3 approved substitutes with OEM supersession labels when applicable And if required input is missing, the panel shows an actionable empty state prompting for model/serial/symptom entry

Real-time Recalculation on Input Changes

Given the FitCheck panel is visible And the agent edits model, serial, symptom code, or selected part When the change is committed (field blur or Enter) Then the verdict, score, explanations, and substitutes update to reflect the new inputs And UI update occurs within 500 ms for cached models and within 2 s otherwise, with a loading indicator shown during computation And an event fitcheck.verdict.changed is emitted with caseId/orderId, previousVerdict, newVerdict, previousScore, newScore, timestamp

Accept Suggested Substitute Inline

Given a suggested substitute is displayed with status Approved and confidence >= 80% When the agent clicks Accept Substitute Then the suggested part is added or swapped into the active order/case line item And required install notes are displayed and must be acknowledged before confirmation And the action is recorded on the case timeline with user, timestamp, original part, substitute part, and confidence And an event fitcheck.substitute.accepted is emitted with caseId/orderId and part identifiers

View Install Notes and Nuances

Given a suggested part includes install notes or nuances When the agent selects View Notes Then a modal or side panel opens showing the full notes text (up to 2000 characters) and any OEM bulletin links And the content is selectable and copyable And the component is fully keyboard navigable and screen-reader labeled And closing the component returns focus to the triggering control And an event fitcheck.notes.viewed is emitted with noteId and part identifiers

Edit Model/Serial Inline

Given model and serial were auto-detected When the agent clicks Edit Model/Serial Then inline fields become editable with format validation and mask appropriate to the OEM And invalid entries show inline error messages and prevent save And saving updates the case/order record and triggers FitCheck recalculation And an event fitcheck.input.updated is emitted with changed fields and timestamp

Override Request on Incompatible Verdict

Given the current verdict is Incompatible When the agent clicks Request Override Then a form requires a reason (minimum 15 characters) and allows optional evidence attachment (PDF/JPG/PNG; max 10 MB) And submitting sets decision status to Override Pending, pauses case SLA timers, and notifies the approver group And on approval, the part is marked Approved via Override and SLA timers resume; on rejection, the part remains blocked and timers resume And events fitcheck.override.requested and sla.paused are emitted on submit, and fitcheck.override.resolved and sla.resumed are emitted on decision, each with decision metadata

Accessibility and Keyboard-Only Operation

Given a keyboard-only or assistive technology user is interacting with the FitCheck panel When navigating and activating all panel controls Then all controls are reachable in logical tab order and operable via Enter/Space And roles, names, and states are exposed for screen readers (ARIA) for verdict, score, explanations, substitutes, and action buttons And color contrast meets WCAG 2.1 AA (>= 4.5:1) and focus indicators are visible And pressing Esc closes any FitCheck modal and returns focus to the opener

Override & Guardrails with Audit Trail

"As a Compliance Manager, I want controlled overrides with full audit logs so that we balance speed with quality and accountability."

Description

Implement policy-driven guardrails that block, warn, or allow orders based on compatibility thresholds and account rules. Enable authorized overrides with mandatory reason capture and attach evidence (photos, notes). Log every decision, input, and outcome to an immutable audit trail and expose exports and dashboards for QA, RMA analysis, and coaching. Prevent checkout below the block threshold unless a compliant override is recorded.

Acceptance Criteria

Block Below Threshold Without Override

Given a cart contains a part with a compatibility score below the account’s block threshold and/or violates an account rule, When the user attempts to proceed to checkout via UI or API, Then checkout is blocked, no order ID is created, no payment is authorized, and error FC-BLOCK-001 with human-readable reason is returned/displayed. And the block reason includes the score, applicable threshold, violated rule(s), and policy version identifier. And if an OEM supersession resolves to a compatible substitute at or above the block threshold, Then the substitute is suggested with confidence score and install notes; the original selection remains blocked. And the checkout decision latency from action to response is ≤ 400 ms at p95.

Authorized Override With Mandatory Evidence

Given a block decision is shown for the current selection and the user has the FitCheck.Override permission, When the user selects Override, Then the system requires a reason category, a free-text reason of ≥ 20 characters, and at least one evidence item (JPG/PNG/PDF up to 10 MB or a structured note) before enabling Submit. And high-risk overrides (score below the account’s hard floor threshold) require 2FA confirmation and a second approver with FitCheck.Override.Approve; the approver cannot be the requester. And upon successful submission, the order is unblocked, the override record is created with actor IDs, timestamps, policy version, evidence file hashes, and approver ID(s), and checkout may proceed. And if permission or required inputs are missing, the override is rejected with FC-OVR-403 and the order remains blocked.

Immutable Audit Trail for Guardrail Decisions

Given any guardrail decision (allow, warn, block, override), When the decision is rendered, Then an audit record is written containing: UTC ISO-8601 timestamp, actor (user/service), decision type, model, serial, symptom code(s), confidence score, thresholds, evaluated rules with pass/fail, policy version, supersession table version, and outcome. And evidence file digests are stored as SHA-256 hashes with filename, size, and MIME type; originals are stored with WORM retention for 7 years. And audit records are hash-chained per order and verifiable via an endpoint that returns chain validity true/false and the last block hash. And read access is role-restricted and supports queries by order ID, account, date range, and decision type with ≤ 1 s p95 response for result sets ≤ 5,000 rows.

Export and Dashboard for QA and RMA Analysis

Given a QA Manager filters the audit data by account, date range, and decision type, When viewing the dashboard, Then the system displays metrics: block rate, warn rate, override rate, first-time-fix rate after override, and 30-day RMA rate, each with trend lines. And selecting a metric opens a drill-down table with columns: order ID, part SKU, model, decision, score, threshold, reason/notes, approver (if any), createdAt, resolvedAt. And the QA Manager can export the current drill-down to CSV (≤ 100k rows) in ≤ 15 seconds or schedule an export to S3; each export includes a SHA-256 checksum file. And dashboard and export reflect the policy version effective at decision time and are consistent with the immutable audit trail.

Role-Based Override Permissions and Controls

Given account-level policies define who can request and approve overrides, When a user without FitCheck.Override attempts to initiate an override, Then the action is disabled and the tooltip “Override not permitted for your role” is displayed; API attempts return FC-OVR-401. And when a user with permission initiates an override, scope is limited to their assigned accounts/channels; cross-account overrides are blocked and logged. And high-risk overrides require secondary approval from a user with FitCheck.Override.Approve; approval and request cannot be by the same user. And all permission checks are enforced server-side and logged as security events with user ID, IP, and outcome.

Warn-and-Proceed Flow With Coaching Capture

Given a part has a compatibility score between the account’s warn and block thresholds, When the user proceeds to checkout, Then a warning modal displays the risk, recommended substitutes with confidence scores, and requires explicit acknowledgment to continue. And if the account policy requires a coaching note on warnings, Then a note of ≥ 10 characters is mandatory before proceeding. And proceeding after warning logs the acknowledgment, coaching note (if provided), and the list of substitutes shown; no override record is created. And the order is allowed to proceed and payment may be authorized.

Continuous Learning Feedback & Admin Console

"As an Ops Analyst, I want to maintain and improve FitCheck mappings based on real-world outcomes so that accuracy increases over time."

Description

Provide an admin console to curate symptom taxonomy, compatibility mappings, substitutes, and install notes, with bulk import/export and versioned change history. Ingest feedback from repair outcomes, returns, and technician comments to reconcile incorrect fits and update weights. Surface discrepancy queues and suggested rule changes, schedule periodic retraining, and track KPIs such as wrong-part rate, first-time-fix rate, and override frequency to continuously improve accuracy.

Acceptance Criteria

Admin Curates Symptom Taxonomy with Versioned History

- Given an Admin with edit permissions, when they create, edit, or deactivate a symptom (code, label, parent, synonyms, status), then the system saves a new version with version ID, actor, timestamp, change diff, and requires a change reason. - Given version history exists, when the Admin clicks Rollback on version V, then the current state is replaced by a new version cloned from V, the rollback reason is captured, and an audit entry is created. - Given validation rules, when a user attempts to save duplicate codes, circular parent relationships, or empty required fields, then the save is blocked and field-level errors are shown. - Given an index of ≥10,000 symptom nodes, when a user searches by code/label/synonym, then results return within 300 ms at p95. - Given role-based access, when an Editor proposes changes, then they are saved as Draft; when an Approver publishes a Draft, then it moves to Active; Viewers cannot create or edit.

Bulk Import/Export of Compatibility Mappings and Install Notes

- Given a CSV or JSON file matching the documented schema for compatibility mappings, substitutes, and install notes, when uploaded, then the system performs pre-validation, displays row-level errors (with line and field), and blocks commit if any errors exist. - Given a valid file ≤200,000 rows, when the Admin confirms import, then the import completes within 10 minutes, is atomic (all-or-nothing), reports records inserted/updated/skipped, and creates a new versioned snapshot. - Given existing data, when an Admin requests Export with filters (date range, brand, model, part, status), then the system generates CSV/JSON with checksum and completes within 30 seconds for up to 50,000 rows. - Given prior versions exist, when a bulk import modifies records, then each changed record references the previous version ID and the change reason is stored at the batch level. - Given schema evolution, when optional columns are absent, then defaults are applied; when unknown columns are present, then they are ignored with warnings.

Automated Feedback Ingestion and Weight Updates

- Given connected sources (repair outcomes, RMAs/returns, technician comments), when the hourly ingestion job runs, then 99% of new items are processed within 15 minutes of availability and deduplicated by claim ID/model/serial. - Given an ingested item, when matching to a FitCheck case, then it links using claim ID/model/serial; if no match is found, the item is routed to the Orphan Feedback queue with reason. - Given negative outcomes attributed to wrong-part, when processed, then compatibility weight for the suggested part-model pair is decreased within bounded limits, the change is logged with source signal strength, and confidence recalculated. - Given conflicting signals (positive and negative for the same pair within the window), when detected, then a discrepancy record is created with suggested rule changes and routed to the Discrepancy queue. - Given an ingestion failure, when retries are exhausted, then the system alerts Admins and marks the batch Failed with a retry action available.

Discrepancy Queue and Suggested Rule Change Review

- Given discrepancy records or low-confidence pairs, when the Admin opens the queue, then items display model/part, signal counts, severity score, and proposed rule/weight change. - Given queue items, when an Approver takes actions (Accept, Reject, Defer), then Accept applies the change to Staging, creates a new version, and optionally schedules publish; Reject records rationale; Defer sets a review reminder date. - Given batch operations, when multiple items are selected, then Accept/Reject/Defer applies in bulk with a single rationale recorded and per-item outcomes logged. - Given publishing controls, when Staging changes are published, then they become Active within 1 minute and a system-wide cache refresh occurs without downtime. - Given notifications, when high-severity items enter the queue, then subscribed roles receive alerts via email and in-app within 5 minutes.

Retraining Scheduler and Model Governance for FitCheck

- Given Admin access, when a user creates a training schedule (daily/weekly) with a lookback window (e.g., last 90 days) and minimum data size, then jobs are queued accordingly and can also be triggered manually. - Given a training job, when it runs, then progress states (Queued, Training, Validating, Ready, Failed) are emitted, logs are retained, and failures generate alerts with error summaries. - Given model validation, when a new model is evaluated on a holdout set, then it must meet or exceed thresholds (e.g., top-1 precision ≥ 0.85, top-3 recall ≥ 0.95) or it is blocked from promotion. - Given staged and active models, when Admin promotes a model to Active, then a shadow-test option exists to compare live suggestions for 10% traffic before full rollout; rollback to the previous model is one click and completes within 2 minutes. - Given data lineage, when a model is promoted, then the artifact, training data snapshot ID, feature version, and approval record are stored and viewable in the console.

KPI Dashboard for Accuracy and Overrides

- Given the dashboard, when a date range and filters (brand, model family, channel) are applied, then Wrong-Part Rate, First-Time-Fix Rate, and Override Frequency are displayed with definitions and trend lines; data freshness is ≤60 minutes. - Given metric definitions, when calculated, then Wrong-Part Rate = wrong-part RMAs / total part orders; First-Time-Fix Rate = cases resolved without re-dispatch / total cases; Override Frequency = manual overrides / FitCheck suggestions; definitions are accessible via tooltips. - Given drill-down, when a user clicks a metric point, then a case list opens with case ID, model, part, suggestion confidence, override flag, and outcome; export to CSV completes within 10 seconds for up to 10,000 rows. - Given alerting rules, when thresholds (e.g., Wrong-Part Rate > 3%) are breached for 3 consecutive days, then alerts are sent to subscribed roles and annotated on the chart. - Given role-based access, when a Viewer accesses the dashboard, then they can view and export but cannot modify metric definitions or alert thresholds.

Smart Reserve

Auto-reserves the best supplier option based on ETA, total landed cost, and supplier reliability—then holds stock while the claim is approved. Includes fallback reservations and auto-cancel on denials to avoid fees. Benefit: guaranteed parts when you need them without overpaying, with minimal manual coordination.

Requirements

Supplier Scoring & Selection Engine

"As an operations lead, I want the system to automatically select the best supplier based on cost, ETA, and reliability so that we reserve parts optimally without manual comparison."

Description

Compute a weighted score across ETA, total landed cost, and supplier reliability for each candidate supplier, using configurable weights and constraints (preferred vendors, geo/region, warranty program rules). Normalize disparate supplier data (units, currencies, time zones) and evaluate options in the context of each claim (part/SKU, service location, SLA). Output a ranked list, designate the primary supplier, and nominate ordered fallbacks. Integrates with ClaimKit’s case data and decision logs to ensure transparent, repeatable selections.

Acceptance Criteria

Weighted Scoring Calculation with Configurable Weights

Given a claim with part SKU and service location and candidate suppliers providing ETA (in days), total landed cost (in base currency), and reliability (0..1) And configured weights of ETA=0.40, Cost=0.30, Reliability=0.30 When the engine evaluates candidates Then it normalizes ETA and Cost using min-max normalization across the candidate set and inverts lower-is-better attributes so higher normalized is better And it uses the given Reliability as already normalized (0..1) And it computes each supplier's final score as sum(weight_i * normalized_i) rounded to 4 decimals And it returns a ranked list in descending score order And it designates the top-ranked supplier as Primary and the remainder as ordered Fallbacks

Normalization of Currencies, Units, and Time Zones

Given suppliers quote prices in mixed currencies and shipping units and provide ETAs in their local time zones And the engine has a configured base currency, distance/weight units, and target time zone (service location) When the engine evaluates candidates at timestamp T Then it converts all monetary amounts to base currency using the configured FX source with a rate timestamp no older than 24 hours And it converts physical units to the configured base units And it converts ETAs to a standardized expected arrival instant at the service location time zone accounting for weekends/holidays per configuration And it logs the conversion metadata (rate timestamp, unit conversions, time zone offsets) alongside normalized values used in scoring

Constraints Enforcement: Preferred Vendors, Geo, Program Rules

Given a claim associated to a warranty program with preferred vendors and disallowed vendors and geographic constraints When the engine evaluates candidates Then it excludes any supplier violating hard constraints (e.g., disallowed list, outside geo radius, program rule mismatches) and records exclusion reasons And it applies configured soft constraints as score adjustments (e.g., +0.05 bonus for preferred vendor) without exceeding score bounds [0,1] And if no eligible suppliers remain Then it returns an empty selection with reason code NoEligibleSuppliers and does not designate a Primary

SLA-Aware Eligibility and Deterministic Tie-Breakers

Given a claim with an SLA requiring part arrival by a specific deadline When the engine evaluates candidate ETAs against the SLA Then it either excludes candidates that cannot meet the SLA or applies the configured SLA penalty to their scores And after scoring, if two or more suppliers have equal final scores within 0.0001 Then it applies tie-breakers in this order: lower total landed cost, higher reliability, shorter ETA, preferred vendor status, then lexicographically lowest supplier_id And it selects the Primary and Fallbacks deterministically based on these rules

Decision Log Transparency and Reproducibility

Given the engine finalizes a selection When it writes the decision log Then the log contains claim_id, timestamp (UTC), configuration version/hash, candidate supplier_ids, raw inputs, normalized attribute values, weights, per-attribute scores, final scores (4 decimals), applied constraints/penalties, and exclusion reasons And when the same inputs and configuration are replayed through the engine Then it reproduces the identical ranking, Primary, and Fallbacks and emits the same log entries (aside from timestamps)

Handling Missing, Unknown, or Stale Supplier Data

Given a candidate is missing a reliability score When the engine evaluates Then it uses the configured default reliability or excludes the candidate per policy and records the reason Given a candidate is missing total landed cost When the engine evaluates Then it excludes the candidate unless allow_unknown_cost=true, in which case it assigns the configured worst-case cost for scoring and flags the candidate Given a candidate's ETA or price data is older than the configured staleness threshold When the engine evaluates Then it applies the staleness penalty or excludes the candidate and records the reason

Performance and Fault Tolerance at Scale

Given a claim with up to 100 candidate suppliers under normal system load When the engine evaluates Then it completes normalization, scoring, ranking, and decision logging within 200 ms at p95 and 500 ms at p99 measured server-side Given the external FX service is unavailable at evaluation time When the engine evaluates Then it falls back to the most recent cached rates not older than 24 hours; if unavailable, it fails the selection with error code FX_UNAVAILABLE and performs no reservation And all failures and fallbacks emit structured metrics and logs with correlation to claim_id

Landed Cost & ETA Normalization

"As a finance-conscious ops manager, I want ETAs and total landed costs normalized across suppliers so that selections reflect true delivery time and spend."

Description

Calculate total landed cost per option by aggregating item price, shipping methods, taxes, surcharges, and anticipated cancellation/restocking fees. Normalize ETAs across time zones and business vs. calendar days, factoring supplier cutoffs, handling times, and delivery windows. Expose comparable metrics to the scoring engine and persist calculations on the claim for auditability and downstream reporting.

Acceptance Criteria

Compute Total Landed Cost per Supplier Option

Given a supplier option with item price 100.00, shipping cost 15.00, taxes 8.25, surcharges 2.00, anticipated cancellation fee 5.00, and restocking fee 3.00 When landed cost is calculated Then total_landed_cost equals 133.25 and each component and the total are stored with two-decimal precision Given any cost component is missing When landed cost is calculated Then the missing component is treated as 0.00 and a cost_component_missing flag is recorded with the component name Given a supplier option When landed cost is calculated Then a cost_breakdown record is persisted including option_id, component amounts, calculation_version, and calculated_at timestamp

Normalize ETA Across Time Zones and Day Types

Given a supplier ETA expressed in business days and a destination timezone When normalization runs Then eta_hours_min and eta_hours_max are computed as comparable hour values and stored alongside eta_day_type="business" Given a supplier ETA expressed in calendar days When normalization runs Then eta_hours_min and eta_hours_max are computed as comparable hour values and stored alongside eta_day_type="calendar" Given a normalization that spans a daylight saving time change When normalization runs Then eta_earliest_at and eta_latest_at are correct across the DST transition in both UTC and destination local time

Apply Supplier Cutoffs and Handling Times to ETA

Given supplier order cutoff at 17:00 in supplier local time and order placed at 16:30 When normalization runs Then handling_time starts the same business day and contributes to eta_hours_min/max accordingly Given supplier order cutoff at 17:00 in supplier local time and order placed at 17:01 When normalization runs Then handling_time starts next business day and contributes to eta_hours_min/max accordingly Given handling time of 1 business day and shipping transit of 2 business days When normalization runs Then total business days considered equals 3 and eta_earliest_at reflects the next valid delivery window Given carrier delivers Monday–Saturday only When normalization runs Then eta_earliest_at and eta_latest_at are adjusted to the next valid delivery day if they fall on a non-delivery day

Expose Comparable Metrics to Scoring Engine

Given a supplier option with calculated total_landed_cost and normalized ETA When the scoring engine requests metrics Then the API returns total_landed_cost, eta_hours_min, eta_hours_max, and calculation_version as typed numeric fields Given metrics are unavailable due to invalid or missing inputs When the scoring engine requests metrics Then the API returns metrics_status="invalid" with reason codes and excludes the option from scoring Given metrics are returned When validated against the schema Then field names, types, and units match the documented contract and include source option_id

Persist Cost and ETA Calculations on Claim

Given cost and ETA calculations complete for a claim When persisted Then the claim stores an immutable calculation snapshot including inputs, outputs, calculation_version, and calculated_at timestamp Given a claim has multiple supplier options When persisted Then each option has a distinct snapshot linked by claim_id and option_id Given an auditor requests the calculation details When retrieving the claim Then the snapshot is accessible via API and UI and includes trigger_source (system/user) and trigger_reason

Recalculate and Version on Input Change

Given any input affecting cost or ETA changes (price, shipping method, taxes, surcharges, fees, cutoffs, handling times, delivery windows) When the change is saved Then a new calculation_version is created, prior versions remain stored, and the latest is marked current=true Given no inputs have changed since the last calculation When a recalculation is requested Then the previous version is reused and no new version record is created Given multiple versions exist When querying version history Then versions are ordered by calculated_at descending and calculation_version increments monotonically

Handle Missing or Invalid Supplier Data

Given one or more cost components are missing When calculating total landed cost Then missing components default to 0.00 and a validation warning is stored per missing component Given ETA data is ambiguous or absent When normalizing ETA Then eta_status="unknown", eta_hours_min and eta_hours_max are null, and the option is excluded from scoring Given any cost or ETA input is invalid (negative amounts, non-numeric values, impossible dates) When validating inputs Then the calculation is rejected for that option, errors are logged with reason codes, and no new snapshot is marked current

Auto-Reserve Orchestrator with Fallbacks

"As a support agent, I want Smart Reserve to automatically place and manage reservations with primary and fallback suppliers so that parts are secured without manual coordination."

Description

Place a reservation with the top-ranked supplier through API, EDI, or structured email, then create contingent fallback reservations according to configurable strategy (sequential on failure vs. parallel soft holds). Ensure idempotency and deduplication to prevent double-holds, and maintain a single active binding reservation at any time. Handle low-stock race conditions with retry/backoff and atomic checks where supported.

Acceptance Criteria

Top-Ranked Supplier Reservation via Preferred Channel

Given a claim with an eligible part and multiple suppliers with configured ETA, landed_cost, and reliability attributes and a ranking policy When the Auto-Reserve Orchestrator runs for the claim Then it computes a score per supplier using the configured weights and selects the highest-scoring supplier And sends a reservation request via the supplier's configured preferred channel (API > EDI > structured email) with the required payload And receives an ACK/2xx within the configured timeout or records a timeout and proceeds per retry policy And persists supplier selection, reservation reference/ID, hold TTL, quoted ETA, and cost breakdown on the case And marks exactly one reservation as BindingPendingApproval

Sequential Fallback on Primary Failure

Given fallback_strategy=sequential and the top-ranked reservation attempt fails (non-2xx/negative ACK/timeout/out_of_stock) When the orchestrator evaluates fallbacks Then it attempts the next-best supplier in order until one succeeds or all suppliers are exhausted And records failure reason codes, timestamps, and attempt count per supplier And ensures no stock is held with failed suppliers (no active holds remain) And results in at most one reservation in BindingPendingApproval state

Parallel Soft Holds with Single Binding Reservation

Given fallback_strategy=parallel and K (configured) suppliers support soft-hold with TTL When the orchestrator runs Then it places soft-hold reservations in parallel with up to K suppliers And upon the first acceptable binding confirmation from the highest-ranked available supplier, it cancels all other soft holds within the configured cancellation window And updates statuses so only one reservation is Binding/Confirmed and all others are Canceled And records cancellation ACKs or retries until configured max attempts, then escalates

Idempotent Reservation and Deduplication

Given the orchestrator is invoked multiple times for the same claim-part tuple and the same idempotency key within the idempotency window When duplicate triggers or retries occur (including concurrent executions) Then only one supplier reservation is created And subsequent invocations return the existing reservation record without issuing new supplier requests And outbound API/EDI/email requests include an idempotency key or dedup hash to suppress duplicates And the audit log contains exactly one reservation_created event for the claim-part tuple

Low-Stock Race Handling with Atomic Checks and Backoff

Given a target supplier has low remaining quantity and supports atomic check-and-hold When the orchestrator attempts to reserve Then it uses the atomic endpoint and either secures the hold or receives a definitive out_of_stock response without partial holds And on out_of_stock or conflict errors, it proceeds to the next supplier following the configured retry/backoff policy (exponential backoff with max retries per config) And the overall reservation decision completes within the configured SLA And if a supplier lacks atomic support, conflicts are retried according to backoff policy without creating duplicate holds

Auto-Cancel on Claim Denial and Hold Expiry Management

Given a claim with an active reservation transitions to Denied or Canceled When the orchestrator receives the state-change event Then it sends cancellation requests to the binding reservation and all soft holds within the configured window And receives cancellation ACKs or retries up to the configured limit and flags exceptions for manual follow-up if still unacknowledged And no cancellation fees are incurred beyond the configured threshold; if fees are expected, the claim is flagged before cancellation And all reservations for the claim end in Canceled state with release confirmations captured

Single Active Binding Reservation Across Lifecycle

Given any sequence of reservation attempts, fallbacks (sequential or parallel), retries, and claim state changes When reservation states are updated Then at any time there is at most one reservation in Binding or Confirmed state for the claim-part tuple And state transitions follow the allowed state machine: None -> SoftHold|BindingPendingApproval -> Binding -> Confirmed|Canceled|Expired And if a later reservation becomes Binding/Confirmed, any earlier overlapping reservations are auto-canceled within the configured window and reflected in the audit trail And the system prevents promotion to Binding if another Binding/Confirmed exists

Conditional Hold & Auto-Cancel on Denial

"As a claims approver, I want reservations to auto-cancel on denials and convert on approvals so that we avoid fees and ensure timely fulfillment."

Description

Tie reservations to claim approval status and SLA timers so that stock is held during review and automatically released on denial, withdrawal, or expiration. Respect supplier cancellation windows and fee policies, trigger timed cancellations before penalties, and update the claim with outcomes and any fees avoided or incurred. On approval, convert holds to orders when configured.

Acceptance Criteria

Hold Initiation on Pending Review

Given a new claim with required parts and at least one eligible supplier with an open cancellation window When the claim status becomes Pending Review Then create a hold for each required part with the selected supplier within 30 seconds And persist holdReferenceId, supplierId, partNumber, quantity, eta, landedCost, cancellationDeadline, supplierTimeZone on the claim line items And start the Parts Hold SLA timer with dueAt = min(configuredHoldSla, cancellationDeadline minus safetyBuffer) And add a timeline event named Parts hold created with supplier, eta, landedCost, and cancellationDeadline

Auto-Cancel on Denial, Withdrawal, or SLA Expiration

Given a claim with one or more active holds When the claim status transitions to Denied or Withdrawn or the Parts Hold SLA timer expires Then cancel all active holds within 2 minutes And set each hold status to Released with releasedAt timestamp And compute feesAvoided = supplier cancellation fee that would apply after the deadline if cancellation occurs before the fee window, otherwise 0 And record feesIncurred if any per supplier response with feeAmount and reason And add a timeline event named Parts hold canceled for each hold with outcome, feesAvoided, feesIncurred And update claim metrics totalFeesAvoided and totalFeesIncurred

Pre-Deadline Cancellation to Avoid Supplier Fees

Given an active hold with a cancellationDeadline and a nonzero fee after deadline and claim is still in Pending Review When current time reaches cancellationDeadline minus safetyBuffer Then proactively cancel the hold to avoid the fee And add a timeline event named Pre-deadline cancellation to avoid supplier fees And notify the claim owner user via in-app notification and email And update feesAvoided with the fee amount that would have applied after the deadline

Auto-Conversion of Hold to Order on Approval

Given a claim with active holds and setting autoConvertHoldsOnApproval is true When the claim status transitions to Approved Then convert each active hold to a purchase order with the same supplier within 2 minutes And persist orderReferenceId, orderStatus Placed, and orderTotal on the corresponding line items And mark the hold as Converted rather than Released And add a timeline event named Hold converted to order for each line And ensure idempotency such that repeated Approved events do not create duplicate orders

Supplier Policy Enforcement and Fee Handling

Given a supplier cancellation policy with a defined window, fee schedule, and time zone and an attempted cancellation for a hold When evaluating the cancellation request Then determine cutoff and fee using supplierTimeZone and the policy version effective at the hold's createdAt And apply feeAmount 0 if cancellation occurs before cutoff, otherwise apply the correct fee per schedule And include policyVersionId, evaluatedAt, cutoffAt, and appliedFeeAmount in the cancellation record And if cancellation is declined due to closed window, execute retryPolicy as configured and raise an alert to the escalation channel

Claim Updates and Audit Trail for Hold Actions

Given any hold action of created, canceled, released, or converted completes When persisting the result Then append a claim timeline entry containing eventType, actor system, timestamp, supplierId, partNumber, quantity, outcome, appliedFeeAmount, feesAvoided And update claim aggregates activeHoldsCount, ordersPlacedCount, totalFeesIncurred, totalFeesAvoided accordingly And expose these fields in Claim Details UI and via API endpoints GET claim and GET claim events And ensure entries are immutable and corrections are recorded as new events linked via supersedesEventId

Real-Time Supplier Sync & Reliability Metrics

"As a sourcing analyst, I want live supplier data and reliability scores so that selection decisions reflect current stock and proven performance."

Description

Integrate with supplier inventory and logistics endpoints to fetch live availability, ship-from locations, ETA promises, and reservation capabilities. Cache with TTL and fall back to last-known-good data during outages, marking confidence levels. Continuously compute reliability metrics (fill rate, lead time variance, cancellation rate) from historical outcomes to feed the scoring engine.

Acceptance Criteria

Live supplier sync retrieves availability, ETA, ship-from, and reservation flags

Given valid supplier API credentials and a SKU exist When a sync is triggered manually or by the scheduler Then the system fetches availability quantity, ship-from location(s), promised ETA, and reservation_capability for the SKU from the supplier And persists the response with data_freshness_ts and confidence="live" And the internal availability API returns the fetched fields within 2 seconds of receipt And the response validates against schema "supplier_sync.v1" with all required fields present

TTL cache and last-known-good fallback behavior

Given TTL=5 minutes and MaxStaleness=60 minutes are configured and last_success_ts exists When a live fetch fails due to timeout, 5xx, or circuit open Then the system serves last-known-good data if now - last_success_ts <= 60 minutes with confidence="cached" and freshness_age reported in seconds And an event "supplier_sync.fallback_used" is emitted with supplier_id and reason And if now - last_success_ts > 60 minutes, the internal API returns availability_status="unknown", confidence="stale", and prevents auto-reservation by setting can_reserve=false And a retry is scheduled using exponential backoff starting at 1 second up to 5 attempts

Continuous reliability metrics computation (fill rate, lead time variance, cancellation rate)

Given historical order and reservation outcomes exist for suppliers over the last 90 days When a new fulfillment outcome, cancellation, or delivery confirmation event is recorded Then reliability metrics are recomputed and persisted within 15 minutes for supplier and supplier+SKU scopes And fill_rate = fulfilled_qty / ordered_qty over rolling 90 days And lead_time_variance = variance(actual_days - promised_days) over rolling 90 days And cancellation_rate = cancellations_by_supplier / total_orders over rolling 90 days And each metric record includes window_start, window_end, sample_size, computed_at, and low_sample=true when sample_size < 30 And metrics are exposed via GET /reliability-metrics?supplier_id&sku with p95 latency <= 500 ms

Reliability metrics feed is consumed by Smart Reserve scoring

Given two suppliers A and B with identical cost and ETA for a SKU and distinct reliability metrics When Smart Reserve requests a supplier score for that SKU Then the scoring engine retrieves current reliability metrics for A and B from the metrics API And the ranking places the supplier with higher fill_rate and lower cancellation_rate above the other And when metrics are unavailable for a supplier, default_priors are applied and a flag metrics_available=false is returned in the scoring trace And the scoring trace includes supplier_id, inputs, weights, and final score for audit

Reservation capability detection and auto-cancel on claim denial

Given a supplier with reservation capability=true and a claim in PendingApproval When Smart Reserve selects this supplier as the provisional best option Then the system creates a reservation hold using an idempotency key and stores hold_id and expiration_at And on Claim Denied event the reservation is canceled within 60 seconds and status=cancelled is confirmed from the supplier API And suppliers with reservation capability=false are skipped without API calls and can_reserve=false is surfaced And all reservation attempts, confirmations, and cancellations are audit-logged with supplier response codes and durations

Supplier API rate limiting, retries, and circuit breaker with observability

Given supplier API endpoints may rate-limit or fail When requests return 429 or 5xx Then the client retries up to 5 times with exponential backoff 1s, 2s, 4s, 8s, 16s and honors Retry-After when present And a circuit breaker opens after 5 consecutive failures within 60 seconds and remains open for 60 seconds before a half-open trial And during an open circuit the system serves cached data per TTL rules and emits "supplier_sync.circuit_open" once per minute And for successful calls, 95th percentile latency over a 1-hour window is <= 2.5 seconds And dashboards display error_rate, cache_hit_ratio, freshness_age, and latency with alerts when thresholds are breached

Reservation SLA Alignment & Timers

"As a queue manager, I want reservation timers aligned to claim SLAs with reminders so that holds don’t lapse before approval."

Description

Align reservation hold durations with claim SLA policies, showing countdowns in the live queue and triggering reminders/escalations before holds expire. Auto-extend holds where supplier policy allows when approvals are imminent, and record all timer events on the claim timeline for visibility.

Acceptance Criteria

Initial Hold Duration Aligned to Claim SLA

Given a claim with an approval SLA target of S hours and a configured hold buffer of B hours and a supplier max hold of M hours When Smart Reserve creates a reservation Then the reservation hold duration is set to min(S+B, M) hours And a countdown timer starts from that duration And the projected hold end timestamp is stored in UTC and displayed on the claim and in the live queue

Live Queue Countdown Display and Color Thresholds

Given an active reservation hold with T time remaining When the live queue is rendered Then the countdown displays T rounded down to the nearest minute and updates at least every 60 seconds And the timer color is green when >50% of original duration remains, amber when 25–50%, and red when <25% And hovering or tapping the timer reveals the exact UTC end timestamp and the user’s local time equivalent

Reminder and Escalation Triggers Before Hold Expiry

Given an active reservation with original duration ≥ 8 hours When time remaining equals 2 hours Then Reminder 1 is sent to the claim assignee and watchers and recorded on the timeline And when time remaining equals 30 minutes Then an escalation notification is sent to the escalation group and recorded on the timeline Given an active reservation with original duration < 8 hours When time remaining crosses 25% of the original duration Then Reminder 1 is sent and recorded on the timeline And when time remaining crosses 5% of the original duration Then an escalation notification is sent and recorded on the timeline

Auto-Extension When Approval Imminent and Policy Allows

Given an active reservation with time remaining ≤ 30 minutes and claim status = "Awaiting Final Approval" and supplier policy supports extensions and the maximum cumulative hold is not exceeded When an auto-extension is attempted Then the hold is extended by the configured increment or up to the supplier’s maximum, whichever is smaller And the countdown and end timestamp are updated within 5 seconds And a timeline entry records the extension amount and supplier response And if the supplier denies the extension, a high-priority escalation is sent and no further auto-extension attempts are made for 60 minutes

Timer Handling on Claim Status Change

Given an active reservation hold When claim status changes to "Approved" Then the hold timer stops immediately and the reservation is marked ready for conversion to order within the configured conversion window When claim status changes to "Denied" or "Cancelled" Then the hold timer stops immediately and the reservation is released When claim status changes to "On Hold - Awaiting Customer" Then the timer continues and an at-risk flag is set if time remaining < 25% of original duration

Comprehensive Timer Event Audit Trail

Given any timer event (start, reminder, escalation, extension, expiry, stop) When the event occurs Then a timeline entry is added containing event type, UTC timestamp, actor (system or user), previous and new end timestamps where applicable, and notification recipients And timeline entries are immutable and can be filtered by event type in the UI and retrieved via API

Audit Trail, Notifications, and Admin Overrides

"As an admin, I want full visibility and override controls for Smart Reserve so that I can audit decisions and intervene when necessary."

Description

Record every scoring factor, supplier option, reservation/cancellation action, and timing event on the claim timeline with immutable entries. Notify stakeholders (ops, approvers, finance) via in-app and email when reservations are placed, nearing expiry, converted, or canceled. Provide role-based overrides to adjust weights, pick a different supplier, or force-cancel/convert, with automatic re-scoring and conflict resolution.

Acceptance Criteria

Immutable Audit Trail of Smart Reserve Decisions

Given a claim enters Smart Reserve scoring with at least one supplier option When scoring completes Then an audit entry is appended to the claim timeline including: event_type=scoring_completed, timestamp (UTC ISO 8601), claim_id, actor=system, algorithm_version, weight_vector, supplier_options[{supplier_id, ETA_days, total_landed_cost, reliability_score}], composite_scores per supplier, selected_supplier_id And the entry is immutable in UI and API; any attempt to edit or delete is rejected with an error and logged as a security event And subsequent reservation, cancellation, conversion, SLA_start, hold_expiry_set events each append audit entries including event_type, timestamp, actor, relevant identifiers (supplier_id, hold_id, po_id), and pre/post state where applicable

Lifecycle Notifications for Reservation Events

Given a reservation is placed by Smart Reserve When the hold is created Then in-app notifications are delivered to Ops and Approver roles within 30 seconds and emails are sent to configured Ops and Finance lists within 2 minutes containing: claim_id, part_number, supplier_name, ETA, total_landed_cost, hold_expiry_at, and a deep link to the claim And notifications are deduplicated so a user receives at most one notification per channel per event within a 10-minute window Given a hold reaches the configured nearing-expiry threshold When the threshold time occurs Then in-app and email notifications are sent as above indicating remaining time to expiry Given a reservation is converted or canceled When the conversion or cancellation completes Then notifications are sent as above including outcome and reason

Role-Based Overrides and Permissions

Given a user has role Ops Admin When they adjust weight settings for ETA, cost, or reliability on a claim Then a reason is required, the change is recorded in the audit trail, and re-scoring is triggered Given a user has role Ops Admin When they manually select a different supplier for a claim Then the system validates availability, places a new reservation with that supplier, cancels any prior active hold, and records both actions in the audit trail Given a user has role Ops Admin or Approver When they force-cancel or force-convert a reservation Then the action executes and is recorded with reason in the audit trail Given a user lacks the required role When they attempt any override Then the action is blocked with a 403-style error, no state change occurs, and the attempt is logged

Automatic Re-Scoring and State Transitions After Overrides

Given weights are adjusted or a manual supplier is selected When re-scoring runs Then new composite scores are computed using the updated weight_vector, recorded to the audit trail, and the current best supplier is identified And if the best supplier differs from the active hold, the system places the new hold first, then cancels the old hold, ensuring no gap in coverage and avoiding duplicate fees; both actions are time-stamped and cross-referenced in the audit trail And SLA timers, hold_expiry_at, and supplier references are updated accordingly with separate audit entries And stakeholders receive updated notifications for any new reservations and cancellations

Concurrent Override Conflict Resolution and Idempotency

Given two users attempt conflicting overrides on the same claim within a short interval When the second save occurs Then the system detects a version conflict using optimistic locking and prevents overwriting without refresh; only one set of changes is applied and the other receives a conflict message Given repeated requests to cancel or convert the same reservation When duplicate requests arrive within 60 seconds Then the operation is idempotent: only one conversion/cancellation occurs, subsequent requests return a no-op response, and only a single outcome audit entry is created

Reproducible Scoring Explanation

Given an audit entry for scoring at time T exists When an authorized user requests an explanation for the selection Then the system reproduces the ranking using the recorded inputs and weight_vector and displays supplier composite scores that match the recorded values within 0.01, shows the applied tie-breaker rule if applicable, and includes algorithm_version And if recomputation differs beyond tolerance, the UI flags a drift warning and records a comparison audit entry

ETA Pulse

Predictive delivery windows with live carrier tracking and confidence bands powered by historical supplier performance, cut‑off times, and regional transit. Proactively alerts when an ETA slips and updates SLA timers. Benefit: honest timelines for customers, fewer missed appointments, and fewer SLA surprises for Ops.

Requirements

Carrier & Supplier Data Integration Hub

"As an operations lead, I want ClaimKit to automatically ingest normalized tracking and shipment events from all our carriers and suppliers so that ETA Pulse can produce accurate, real-time delivery windows without manual data wrangling."

Description

Build and maintain connectors to major parcel and freight carriers (e.g., UPS, FedEx, USPS, DHL, regional) and supplier drop-ship systems to ingest tracking numbers, shipment events, and delivery confirmations in real time. Normalize disparate carrier event schemas into a unified shipment event model mapped to ClaimKit cases and orders. Support webhooks, polling fallbacks, and idempotent processing with retries, rate-limit handling, and event deduplication. Securely manage credentials (OAuth/API keys), with encryption at rest/in transit and automated secret rotation. Provide observability (per-connector health, lag, error rates), sandbox environments, and backfill capabilities. Enrich shipments by linking to existing ClaimKit magic inbox data (receipts/serials) to resolve ambiguous shipments. This hub is the foundation for ETA Pulse inputs across all channels.

Acceptance Criteria

Real-Time Carrier Webhook Ingestion and Unified Mapping

Given registered webhook subscriptions for UPS, FedEx, USPS, DHL, and at least one regional carrier When the carrier posts shipment events (pickup, in_transit, out_for_delivery, delivered, exception) Then the hub validates signatures, persists raw payloads, maps to a unified ShipmentEvent model (carrier_code, tracking_number, event_type, event_time UTC, location, status_details, proof), and emits an internal event within 5 seconds p95 And the event is associated to the correct shipment by (carrier_code, tracking_number) or a pending shipment stub is created if missing And delivery confirmation events include proof_of_delivery metadata when provided by the carrier

Polling Fallback with Idempotency and Deduplication

Given a carrier without webhooks or with webhook outage When polling runs at the configured interval (≤ 10 minutes) Then new/updated events are fetched and ingested without breaching carrier SLAs Given duplicate events (same carrier event ID or same tracking_number+event_type+event_time+location) When processed multiple times due to retries Then only one unified event exists and processing is idempotent And on network/5xx errors, retries use exponential backoff up to 5 attempts achieving ≥ 99% daily success with ≤ 0.1% permanently failed events queued for review

Secure Credential Management and Automated Rotation

Given OAuth tokens and API keys per connector When stored Then secrets are encrypted at rest (AES‑256) and in transit (TLS 1.2+) with access scoped per-tenant When rotation occurs (scheduled ≤ 90 days or forced) Then tokens/keys update without downtime and audit logs record actor, timestamp, connector, and action Given a simulated credential compromise in sandbox When revocation is triggered Then access is disabled within 5 minutes and security alerts are sent to the configured channel

Observability Dashboards and Alerting per Connector

Given the hub is operating When viewing Observability Then each connector shows: status (healthy/degraded/down), ingest lag p50/p95, events/min, error rate %, last successful sync And alerts trigger when lag > 15 minutes p95 for 10 consecutive minutes or error rate > 2% over a 5‑minute window And logs/traces include correlation IDs (tracking_number, carrier request ID) with metrics exportable via Prometheus/OpenTelemetry

Sandbox Environments and High-Volume Backfill

Given sandbox credentials per connector When sending test events Then events process end‑to‑end without affecting production data and are labeled as sandbox Given onboarding a new connector When backfilling 90 days for up to 100k tracking numbers Then throughput sustains ≥ 200 events/second with zero data loss and deduplication rules enforced And backfill jobs are resumable with checkpoints and emit a completion report (processed, deduped, failed) with error export

Shipment Linking and Enrichment from Magic Inbox

Given a shipment event with tracking_number When a ClaimKit case/order with matching tracking_number or order_id is present via magic inbox data Then the shipment links to the correct case/order with ≥ 98% precision When multiple candidates exist Then deterministic tie‑breakers are applied (exact match > recency > same customer email > same serial) and unresolved items are queued for manual review with top 3 suggestions And upon linking, the case/order timeline displays normalized events within 10 seconds p95

Rate Limit Compliance and Fair-Share Scheduling

Given carrier API rate limits per key/IP When request volume approaches thresholds Then client‑side throttling keeps 429 responses < 1% and honors Retry‑After on 429s And retried requests are rescheduled without data loss Given multiple tenants on a shared connector When demand is uneven Then fair‑share scheduling ensures no tenant consumes > 50% capacity unless explicitly configured

Predictive ETA Engine

"As a support manager, I want ETA Pulse to predict realistic delivery windows that reflect our suppliers’ true performance so that customers get accurate expectations and my team reduces recontacts and escalations."

Description

Implement a prediction service that computes delivery windows using historical supplier and carrier performance, warehouse cut-off times, pickup windows, service levels, lane-specific transit distributions, holidays, and regional effects. Generate probabilistic ETAs (e.g., P50/P80/P95) and an "honest" customer-facing window, with continuous recalculation as new scan events arrive. Start with robust rules-based heuristics and support pluggable ML models with versioning, feature store, and drift detection. Persist predictions with lineage and justification metadata for auditability. Provide accuracy metrics (MAPE, on-time percentage) and auto-recalibration by lane, supplier, and service level. Offer batch backfill and streaming updates, with SLA-safe defaults for cold starts.

Acceptance Criteria

Compute Probabilistic ETA and Honest Window

Given a shipment with supplier, carrier, service level, origin, destination, warehouse cutoff, pickup window, and holiday calendar inputs When the prediction service is invoked Then it returns P50, P80, and P95 delivery timestamps where now <= P50 <= P80 <= P95 And returns a customer-facing honest window derived from quantiles using config rules: window_start = floor_to_day(P50); window_end = ceil_to_day(P95); min_width_days >= 1; max_width_days <= 7; bounds in customer local time zone And includes uncertainty_band_width_hours = hours(P95 - P50) and sets low_confidence = true if band_width exceeds lane-configured maximum And uses the rules-based baseline when no ML model is active for the lane

Continuous Recalculation on New Scan Events

Given an in-transit shipment and a new carrier scan event (pickup, departure, arrival, exception) When the event is ingested Then the ETA is recomputed and persisted within 2 seconds P95 and 5 seconds P99 from event ingestion time And if the recomputed P80 differs by more than 4 hours from the prior P80, an ETA_slipped flag is set true and change_reason contains the triggering event id And when ETA changes, the linked SLA timer is updated within 2 seconds P95 And the previous prediction is retained as a versioned record with superseded_by reference

Prediction Persistence with Lineage and Justification

Given a prediction is generated When it is persisted Then the record includes shipment_id, generated_at, model_type, model_version, feature_store_snapshot_id, training_data_version, quantiles (P50, P80, P95), honest_window_start, honest_window_end, timezone, low_confidence flag, and change_reason (nullable) And includes justification with either top_3_feature_contributions (ML) or matched_rules and rule_weights (rules-based) And predictions are immutable; any update creates a new version with version_id and prior_version_id And re-running the service with identical inputs, model_version, and feature snapshot produces identical quantiles within 1 minute tolerance

Cold Start SLA-Safe Defaults

Given a lane, supplier, and service level with fewer than 50 historical deliveries in the last 90 days When a prediction is requested Then the engine uses SLA-safe defaults: P80 >= contractual_transit_days; P95 >= contractual_transit_days + regional_uplift_days; honest_window_end >= P95 And default variance parameters produce an honest window width between 1 and 7 days (configurable per service level) And low_confidence = true and justification includes cold_start_defaults with applied parameters

Accuracy Metrics and Auto-Recalibration

Given completed deliveries with predicted and actual delivery timestamps When daily metrics are computed Then for each (lane, supplier, service_level) the system stores MAPE, MAE_hours, on_time_percentage (actual <= P80), and calibration_error (distribution of actuals in [P50,P80) and [P80,P95]) And if any last-7-day metric exceeds thresholds (MAPE > 15% or on_time_percentage < 92% or calibration_error outside ±5%), an auto-recalibration job updates baseline parameters and records a calibration_change with before/after values And post-recalibration, the next day's metrics are linked to the calibration_change record

Batch Backfill and Streaming Throughput

Given a historical dataset of shipments When a backfill job runs Then the system processes at least 2,000 shipments per minute with success_rate >= 99.5% and writes predictions with lineage for each And backfill is idempotent; rerunning with identical parameters does not create duplicate active predictions and preserves versioning Given a live stream of scan events at 100 events/second When processed Then end-to-end prediction update latency is <= 2 seconds P95 and <= 5 seconds P99 with zero data loss observed in reconciliations

Product Details

Vision & Mission

Problem & Solution

Details & Audience

User Personas

Product Features

Receipt Forensics

Requirements

Document Normalization & OCR Pipeline

Description

Acceptance Criteria

Visual Tamper Detection Engine

Description

Acceptance Criteria

Metadata Anomaly Analysis

Description

Acceptance Criteria

Barcode & Serial Cross-Validation

Description

Acceptance Criteria

Suspicious Region Highlighting UI

Description

Acceptance Criteria

Confidence Scoring & Auto-Routing

Description

Acceptance Criteria

Audit Log & Evidence Export

Description

Acceptance Criteria

Serial Graph

Requirements

Serial Normalization & Fingerprinting

Description

Acceptance Criteria

Graph Construction & Entity Resolution Engine

Description

Acceptance Criteria

Velocity & Geo-Improbability Detection

Description

Acceptance Criteria

Real-time Duplicate Submission Blocker

Description

Acceptance Criteria

Ring Detection & Repeat Offender Profiling

Description

Acceptance Criteria

Merge Suggestion Engine

Description

Acceptance Criteria

Reviewer Console & Auditability

Description

Acceptance Criteria

Fraud Score

Requirements

Real-time Fraud Scoring Service

Description

Acceptance Criteria

Signal Ingestion and Feature Store

Description

Acceptance Criteria

Threshold Policies and Auto-Routing

Description

Acceptance Criteria

Explainability and Reason Codes

Description

Acceptance Criteria

Admin Configuration and Sandbox

Description

Acceptance Criteria

Monitoring, Drift Detection, and Audit Logging

Description

Acceptance Criteria

Step-Up Proof

Requirements

Adaptive Risk Scoring & Triggering

Description

Acceptance Criteria

Dynamic Evidence Prompt Generator

Description