Test Article
Record-Book Incremental Stats Engine
Subject: Transitioning to Event-Driven Architecture with RabbitMQ & Bulk Upserts
Scope: 30 Tenants | "Giant" Tenants (30M+ Rows) | Standard Tenants (~1M+ Rows)
Infrastructure Target: AWS RDS db.t3.large (8GB RAM) | RabbitMQ Cluster
1. Executive Summary
The current Batch Rebuild architecture operates at O(N) complexity, where system load increases linearly with data growth. For our “Giant" tenants (30M+ records), this model causes severe CPU starvation, table locking, and IOPS exhaustion during rebuild cycles.
The proposed solution transitions the system to an Event-Driven Delta model. By utilizing RabbitMQ as a high-throughput buffer and implementing Micro-Batching Consumers, we shift the computational cost from O(N) to O(1). This ensures that a 30M-row tenant updates with the same sub-second latency as a 500k-row tenant, while reducing Database I/O by approximately 90%.
2. Problem Statement
The current system relies on periodic aggregation queries (SUM, COUNT) against the match_stat_player table. This approach is no longer viable due to:
- CPU Spikes: Aggregating 40M rows consumes 100% of CPU credits on t3 instances.
- Database Locking: Long-running writes block read operations, degrading the API experience.
- IOPS Saturation: Frequent large-scale reads/writes deplete the GP3 storage burst balance.
3. Infrastructure Layer: Reliability & Scaling
3.1 Message Broker: RabbitMQ (The Buffer)
To decouple high-velocity stat updates from the database layer, we will utilize RabbitMQ.
- Role: Acts as a durable buffer that absorbs traffic spikes (e.g., simultaneous matches).
- Durability: Messages are persisted to disk to survive service restarts.
- Flow Control: We utilize prefetch_count (e.g., 500) to ensure workers process messages at a sustainable rate without exhausting RAM.
3.2 Hardware Upgrade: db.t3.large (8GB RAM)
Upgrading to 8GB of RAM is a critical requirement for this architecture.
- InnoDB Buffer Pool: The increase in RAM allows the RDS to keep the "Hot" index trees for the 5 Giant Tenants in memory.
- Why it matters: If the index does not fit in RAM, the database reverts to Disk I/O, which is 100x slower. 8GB provides the necessary headroom for our dataset.
4. Core Logic: The Streaming Pipeline
4.1 Phase 1: The Producer (Zero-Latency)
The application intercepts changes in the match_stat_player table. Instead of calculating totals, it calculates the Delta (Difference).
- Trigger: Event Listener on MatchStat model.
- Payload: Lightweight JSON:
JSON
{
"tenant_id": "client_A",
"league_id": 10,
"member_id": 505,
"stat_id": 3,
"delta": 1 // +1 Goal
}
- Performance: The Producer finishes execution in milliseconds, as it does not wait for the database write.
4.2 Phase 2: The Smart Consumer (Micro-Batching)
Directly writing every message to the DB would kill IOPS. The Worker implements a Buffering Strategy:
- Accumulation: The worker pulls messages from RabbitMQ and stores them in an internal memory array (grouped by Tenant ID).
- Execution Trigger: The buffer flushes to the database when:
- Buffer size reaches 1,000 records, OR
- Time elapsed reaches 60 seconds.
- Context Switching: The worker dynamically switches the Database Connection to the specific Tenant ID before executing the write.
4.3 Phase 3: Bulk Atomic Upsert
We execute a single SQL query to update hundreds of rows simultaneously.
SQL Logic:
SQL
INSERT INTO record_book_career (league_id, member_id, stat_id, value)
VALUES
(1, 101, 3, 1),
(1, 102, 3, 2),
... (up to 1,000 rows)
ON DUPLICATE KEY UPDATE
value = value + VALUES(value);
- Efficiency: Reduces 1,000 I/O operations into 1 I/O operation.
- Atomic Safety: The math happens inside the storage engine, guaranteeing data integrity even during concurrency.
5. Storage Strategy & Optimization
5.1 Specialized Record Tables
Data is distributed across three optimized tables:
- record_book_career: Cumulative all-time stats.
- record_book_season: High-water mark tracking (Updates only if New_Total > Best_Season).
- record_book_game: Peak performance per match.
5.2 Advanced Partitioning & Indexing
For the 5 Giant Tenants, standard indexing is insufficient.
- Partitioning: Tables are partitioned by league_id. This restricts MySQL search operations to small physical segments of the disk.
- Clustered Composite Key: (league_id, stat_id, member_id). This ensures the database finds the exact row via a direct "Seek" operation rather than a "Scan."
6. Legacy Data Migration (Backfilling)
To populate the system with legacy records without downtime:
- Chunking: A background crawler processes legacy data in chunks of 5,000 IDs.
- Throttling: The script monitors AWS CloudWatch for RDS CPU Credits. If credits fall below 30%, migration pauses automatically.
- Parallel Convergence: Since the logic is additive (A + B = B + A), live RabbitMQ updates and the Legacy Migration script can run simultaneously without data corruption.
7. Real-Time Leaderboard Caching (Top 10)
We avoid expensive ORDER BY queries on 30M rows using Threshold-Based Caching:
- The Threshold Rule: Redis stores the score of the 10th-place player.
- Conditional Trigger: A WarmCacheJob is only dispatched if a player's new score exceeds the threshold.
- Debouncing: Multiple warm requests are collapsed into a single query every 60 seconds to prevent "Job Storms."
8. Reliability & Fault Tolerance
- RabbitMQ Acknowledgements: Messages are only acked (removed from the queue) AFTER the Bulk Upsert is successfully committed to the database.
- Idempotency: A stat_updates_log table tracks processed batches to prevent double-counting if a worker crashes mid-process.
- Dead Letter Exchange (DLQ): Failed batches are routed to a separate queue (dlx_stats) for manual inspection instead of blocking the pipeline.
9. Strategic Impact & Business Value
- High Throughput: The architecture can handle spikes of thousands of events per minute without lag.
- Cost Efficiency: We maximize the utility of the t3.large instance by reducing Database I/O by ~90%.
- Scalability: The system becomes volume-agnostic; the performance for a 40M-row tenant is identical to that of a small tenant.