Skip to content

Data Integrity System

The Data Integrity System is an automated framework for detecting and fixing data inconsistencies in the Convex database. It validates denormalized fields, relational consistency, and calculated values across 12+ tables.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                    INTEGRITY AUDIT SYSTEM                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐  │
│  │  Data Context    │  │   Validators     │  │  Fix Mutations   │  │
│  │  (dataContext)   │  │   (lib/)         │  │    (fixes/)      │  │
│  │                  │  │                  │  │                  │  │
│  │ - fetchAuditData │  │ - Pure functions │  │ - milestoneXyz   │  │
│  │ - 8 tables in    │  │ - Accept context │  │ - obsCascade     │  │
│  │   parallel       │  │ - Run in parallel│  │ - commitmentSum  │  │
│  │                  │  │                  │  │ - nameConsistency│  │
│  └──────────────────┘  └──────────────────┘  └──────────────────┘  │
│           │                    │                                    │
│           └────────────────────┼────────────────────────────────┐  │
│                                ▼                                │  │
│                        ┌──────────────────┐                     │  │
│                        │     Runner       │                     │  │
│                        │   (runner.ts)    │                     │  │
│                        │                  │                     │  │
│                        │ - Fetch once     │◄────────────────────┘  │
│                        │ - Validate all   │                        │
│                        │ - runFix         │                        │
│                        │ - runFromDashboard                        │
│                        └──────────────────┘                        │
│                                │                                    │
│         ┌──────────────────────┼──────────────────────┐            │
│         ▼                      ▼                      ▼            │
│  ┌─────────────┐       ┌──────────────┐      ┌────────────┐        │
│  │   Store     │       │   Discord    │      │  Cron Job  │        │
│  │   Reports   │       │   Alerts     │      │  (daily)   │        │
│  └─────────────┘       └──────────────┘      └────────────┘        │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Performance

The system uses a fetch-once, validate-parallel pattern for optimal performance:

MetricValue
Data fetch~8 parallel queries
Validation~70ms (8 validators in parallel)
Full audit~3s total

Previously: ~24 sequential queries. Now: ~8 parallel queries + parallel validation.

File Structure

packages/convex/integrityAudit/
├── index.ts                    # Main exports
├── runner.ts                   # Orchestration (runAll, runFix, runFromDashboard)
├── dataContext.ts              # AuditDataContext type & fetchAuditData()
├── discord.ts                  # Discord bot API integration
├── queries.ts                  # Internal queries for data fetching
├── mutations.ts                # Mutation handlers
├── checks/
│   └── utils.ts                # Helper: fetchAllPages
└── fixes/
    ├── types.ts                # FixResult types
    ├── milestoneConsistencyFix.ts
    ├── obsCascadeFix.ts
    ├── commitmentTimeSumFix.ts
    ├── designElementNameFix.ts
    ├── coordinatorNameFix.ts
    ├── assigneeNameFix.ts
    └── issueNameFix.ts

packages/convex/lib/integrityValidators/
├── types.ts                    # ValidationResult, AuditDataContext
├── milestoneConsistency.ts     # Pure validator function
├── obsCascade.ts
├── commitmentTimeSum.ts
├── totalTimeConsistency.ts
├── designElementNameConsistency.ts
├── coordinatorNameConsistency.ts
├── assigneeNameConsistency.ts
└── issueNameConsistency.ts

Validators

Core Validators

ValidatorWhat it ValidatesError Type
Milestone Consistencydods.milestoneNumber matches issues.milestoneNumbermilestone_mismatch
OBS CascadeWhen issue is OBS, related dodNotes.isObsolete must be trueobs_cascade_missing, obs_cascade_orphan
Commitment Time SumSum of dodNotes.estimateTime equals assigneeReservations.currentCommitmentTimecommitment_sum_mismatch
Total Time ConsistencyCalculated total times match expected valuestotal_time_mismatch

Name Consistency Validators

ValidatorTables AffectedField Synced
Design Element Nameissues, solutions, dods, assigneeReservations, milestones, dodNotifications, dodNotesdesignElementName
Coordinator Nameissues, solutions, milestones, assigneeReservationscoordinatorName
Assignee Namedods, dodNotesassigneeUserName, assigneeName
Issue Namedods, solutions, dodNotesissueName

Running Checks

Run Modes

The integrity audit system supports three run modes:

ModeDescription
checkOnlyRun validation checks only (default)
dryRunAutoFixRun checks, then simulate fixes without applying
executeAutoFixRun checks, then apply fixes to failed validators

Run All Checks

typescript
await ctx.runAction(api.integrityAudit.runner.runAll, {
  milestoneNumber: 74,        // Optional: scope to milestone
  sendDiscordAlert: true,     // Optional: notify Discord (defaults to true)
  runMode: "checkOnly",       // Optional: checkOnly | dryRunAutoFix | executeAutoFix
});

Run from Dashboard

A public action is available for running audits directly from the Convex dashboard:

typescript
await ctx.runAction(api.integrityAudit.runner.runFromDashboard, {
  runMode: "checkOnly",       // Optional: defaults to "checkOnly"
  sendDiscordAlert: true,     // Optional: defaults to true
});

This action has sensible defaults and can be invoked without any arguments for a quick full audit with Discord notifications.

Run Single Check

typescript
await ctx.runAction(api.integrityAudit.runner.runCheck, {
  checkName: "milestoneConsistency",
  milestoneNumber: 74,
});

Run Check with Auto-Fix

typescript
// Step 1: Dry run (simulate)
const { checkResult, fixResult } = await ctx.runAction(
  api.integrityAudit.runner.runCheckAndFix,
  {
    validator: "designElementNameConsistency",
    autoFix: true,
    dryRun: true,  // Simulate only
  }
);

// Step 2: Review results, then apply
const { fixResult: actual } = await ctx.runAction(
  api.integrityAudit.runner.runCheckAndFix,
  {
    validator: "designElementNameConsistency",
    autoFix: true,
    dryRun: false,  // Apply fixes
  }
);

Fix Mutations

Dry-Run Pattern

All fixes follow a dry-run-first pattern:

  1. Run fix with dryRun: true to simulate
  2. Review results (fixed, failed, skipped counts)
  3. Run with dryRun: false to apply

System Attribution

All fixes are marked with a system user for audit trail:

typescript
// Patches include:
{
  designElementName: expected,
  lastChangedBy: "system:integrity-fix:designElementName",
  updatedAt: Date.now(),
}

Batched Operations

Fixes handle large datasets via pagination:

typescript
while (!isDone) {
  const page = await ctx.db
    .query("issues")
    .paginate({ cursor, numItems: 100 });

  for (const record of page.page) {
    // Process each record
  }

  cursor = page.continueCursor;
  isDone = page.isDone;
}

Discord Integration

Setup

Required environment variables:

bash
DISCORD_BOT_TOKEN=your_bot_token
DISCORD_INTEGRITY_CHANNEL_ID=123456789012345678
DISCORD_INTEGRITY_MENTION_USER_ID=987654321098765432
CONVEX_DASHBOARD_URL=https://dashboard.convex.dev/t/team/project/deployment

Alert Format

Discord receives a compact summary format optimized for large datasets:

  • Pass/fail counts per validator
  • Total records checked
  • Duration and scope information
  • Button linking to Convex dashboard
@YourName Data Integrity Audit Complete

Scope: Full | Duration: 3.2s

✅ 6 passed | ❌ 2 failed

Failed:
• totalTimeConsistency (3 errors)
• assigneeNameConsistency (12 errors)

[🔍 View in Convex Dashboard]

The compact format prevents Discord API 400 errors on large datasets with many error details.

Automatic Scheduling

A daily cron job runs all checks at 6 AM UTC:

typescript
// packages/convex/crons.ts
crons.daily(
  "integrity-audit-daily",
  { hourUTC: 6, minuteUTC: 0 },
  internal.integrityAudit.runner.runAll,
  { sendDiscordAlert: true }
);

Storage Schema

Check Results

typescript
integrityCheckResults: defineTable({
  runId: v.string(),
  timestamp: v.number(),
  validator: v.string(),
  isValid: v.boolean(),
  checkedRecords: v.number(),
  duration: v.number(),
  errors: v.array(v.object({
    type: v.string(),
    message: v.string(),
    recordId: v.string(),
    expected: v.optional(v.string()),
    actual: v.optional(v.string()),
  })),
  expiresAt: v.number(),  // 30-day retention
})

Fix Results

typescript
integrityFixResults: defineTable({
  runId: v.string(),
  timestamp: v.number(),
  validator: v.string(),
  processed: v.number(),
  fixed: v.number(),
  failed: v.number(),
  skipped: v.number(),
  dryRun: v.boolean(),
  expiresAt: v.number(),
})

Quick Reference

ActionFunctionArgs
Run all checksrunner.runAll{ milestoneNumber?, sendDiscordAlert?, runMode? }
Dashboard actionrunner.runFromDashboard{ runMode?, sendDiscordAlert? }
Run one checkrunner.runCheck{ checkName, milestoneNumber? }
Fix validatorrunner.runFix{ validator, dryRun? }
Check + fixrunner.runCheckAndFix{ validator, autoFix, dryRun? }

Design Principles

  1. Dry-Run First - Always test fixes before applying
  2. Batched Operations - Handle 10k+ records via pagination
  3. System Attribution - All fixes marked for audit trail
  4. Graceful Degradation - Missing env vars skip Discord, don't crash
  5. 30-Day Retention - Results auto-expire
  6. No Hard Deletes - Use soft-delete only
  7. Modular Validators - Each check is independent
  8. Type Safety - Full TypeScript with strict mode

Internal Documentation