🏠 Home 🏏 Teams 🏆 Rankings 🔍 Artifacts 🔬 Analysis 📚 Research 🎬 The Film Room 🌹 About 📡 The Boardroom

🔍 Data Quality Audits

Audit
🆔

Player ID Mismatch Audit

player_id_audit_report.md

Comprehensive audit identifying 15+ player ID mismatches between Cricsheet data and IPL 2026 squad files. Resolves duplicate IDs from franchise trades (MI↔LSG) and ensures data integrity across all analytics pipelines.

Why this was done
Discovered during stat pack generation that some players appeared twice or had incorrect stats. This audit created canonical ID mappings used by all downstream outputs.
📄 View on GitHub →
Audit
🎯

Entry Point Audit

entry_point_audit_report.md

Analysis of where batters typically bat in the order (entry points). Validates our batting position classifications and ensures OPENER, MIDDLE_ORDER, and FINISHER tags align with actual match data.

Why this was done
Andy Flower flagged that some "opener" tags didn't match where players actually batted. This audit provided evidence-based entry point distributions.
📄 View on GitHub →

📈 EDA & Threshold Analysis

EDA
📊

Threshold EDA (2023+ Data)

threshold_eda_2023.md

Exploratory data analysis that determined optimal thresholds for player tags using only 2023-2025 IPL data. Establishes baselines for SPECIALIST vs VULNERABLE, PP_BEAST vs PP_LIABILITY, and other phase-specific tags.

Why this was done
Original thresholds were based on all-time T20 data, causing stat drift. This EDA recalibrated all thresholds to reflect the current T20 meta (2023+).
📄 View on GitHub →
Validation
⚖️

Baselines vs Tags Comparison

baselines_vs_tags.md

Systematic comparison of league-wide baselines against our player tag criteria. Validates that SPECIALIST and VULNERABLE thresholds are meaningful relative to overall IPL performance distributions.

Why this was done
Jose Mourinho challenged whether our tag thresholds were statistically defensible. This analysis proved that tags represent meaningful deviations from league averages.
📄 View on GitHub →

📚 Research & Methodology

Research
🏈

PFF Grading System Research

pff_grading_system_research.md

Deep-dive into Pro Football Focus's play-by-play grading methodology. Explores how context-aware evaluation and position-specific metrics can be adapted for cricket analytics.

Why this was done
PFF revolutionized NFL analytics by going beyond box scores. This research informed our approach to context-aware player evaluation in cricket.
📄 View on GitHub →
Research
🏀

KenPom Methodology Research

kenpom_methodology_research.md

Analysis of KenPom's college basketball analytics — adjusted efficiency, tempo-free stats, and the "Four Factors." Foundation for our planned CricPom team rating system.

Why this was done
KenPom's adjusted metrics revolutionized basketball analysis. This research explores how to create cricket equivalents for team-level efficiency ratings.
📄 View on GitHub →
Research

CricPom Prototype Specification

cricpom_prototype_spec_020926_v1.md

Opponent-adjusted, venue-normalized, phase-aware T20 efficiency rating system. 5-factor tournament weighting (PQI, CI, Recency, Conditions, Confidence) with sigmoid-based sample size scoring. 231 IPL 2026 players rated across 14 T20 tournaments.

Why this was done
KenPom transformed college basketball evaluation. This spec translates that methodology to T20 cricket — adjusting for opponent quality, venue conditions, and sample size to produce honest player efficiency ratings.
📄 View Spec on GitHub → 📊 View Player Feed (231 players) →
Research

Tournament Composite Weights

tkt187_final_weights.py

Geometric mean formula weighting 14 T20 tournaments across 5 factors: Player Quality Index, Effective Conditions Index, Recency decay, Conditions Similarity, and Sample Confidence. Produces per-team weighted SR, avg, economy, boundary%, and dot% composites.

Why this was done
A simple average of IPL + BBL + PSL stats would be misleading. Different tournaments have different quality levels. This engine ensures the IPL carries more weight than a T10 league, while still using cross-tournament data for players with limited IPL samples.
📄 View Weights Engine →
Research
📈

Insight Confidence Framework

sigmoid_confidence()

Sigmoid-based confidence scoring: 1 / (1 + exp(-0.02 × (matches - 100))). Players with 200+ matches get 95%+ confidence. Under 30 matches = low confidence. Applied to all 231 CricPom player ratings and pressure performance metrics.

Why this was done
Treating a player with 5 IPL innings the same as one with 150 is dishonest. This framework quantifies statistical reliability so readers know which insights are backed by deep data and which are small-sample estimates.
📄 View Implementation →
Research
🔮

Silhouette Score Validation

baseline_comparison.py

Three-metric cluster validation: silhouette score (cohesion vs separation), Davies-Bouldin index (cluster similarity), and Calinski-Harabasz index (between/within variance ratio). Tests K-means against random baseline to confirm clustering is meaningful, not noise.

Why this was done
Clustering algorithms will always produce clusters — even from random data. Without validation metrics, our batter/bowler archetypes could be statistical artifacts. Silhouette scoring proves the structure is real.
📄 View Validation Code →
Research
🎯

Pressure Sequence Analysis

generate_momentum_data.py

Consecutive dot ball and boundary sequence analysis for bowling pressure and batting resilience profiling. Rates teams Elite/Strong/Average/Weak based on sequence length thresholds. Identifies clutch performers and choke risks via SR delta under pressure.

Why this was done
Traditional bowling stats (economy, SR) miss the ability to build sustained pressure. A bowler who bowls 4 dots then leaks a boundary is different from one who never strings 3 dots together. Sequences reveal pressure capability that averages hide.
📄 View Generator →
Spec
🎨

Player Clustering PRD

player_clustering_prd.md

Product requirements document for our K-means clustering model. Defines the 6 batter archetypes (EXPLOSIVE_OPENER to FINISHER) and 7 bowler archetypes (PACER to PART_TIMER).

Why this was done
Before building the clustering model, we needed to define what archetypes should exist and how they map to cricket roles. This PRD aligned the team.
📄 View on GitHub →
Spec
🎭

Cluster Archetypes (Creative)

cluster_archetypes_creative.md

Creative descriptions and narrative framing for each player archetype. Makes technical clusters accessible to fans through cricket storytelling and real-world player examples.

Why this was done
Raw cluster labels (Cluster 0, 1, 2...) don't resonate with fans. This doc created the narrative layer that makes analytics accessible.
📄 View on GitHub →

🧩 Player Pattern Recognition

Analytics
📉

Batter Consistency Index

batter_consistency_index.csv

Rolling consistency analysis across IPL 2023-2025. Tracks coefficient of variation in runs scored, single-digit failure rate, and form trajectory by season. Separates "reliable anchors" from "streaky match-winners."

Why this matters
Raw averages hide volatility. A 35-average batter who scores 0, 70, 0, 70 is very different from one who scores 30, 40, 35, 30. This index reveals who you can depend on.
📊 View Data →
Analytics
🤝

Partnership Synergy Scores

partnership_synergy.csv

Measures how batting pairs amplify each other's performance. Synergy index compares partnership run rates against individual averages. Year-wise trends show which combinations are improving or declining.

Why this matters
Predicted XIs aren't just about 11 individuals — it's about combinations. Partnership data reveals which batting pairs create more than the sum of their parts.
📊 View Data →

⚔️ Matchup Intelligence

Matchups
🏏

Batter vs Bowling Type

batter_bowling_type_matchup.csv

How every IPL batter performs against pace, off-spin, leg-spin, and left-arm spin. Identifies PACE_SPECIALIST, SPIN_SPECIALIST, and VULNERABLE_VS_SPIN tags based on 2023-2025 data with statistically significant sample sizes.

Why this matters
A batter averaging 45 overall might average 55 vs pace but 25 vs spin. This matchup data drives bowling attack composition and batting order decisions.
📊 View Data →
Matchups
🎳

Bowler vs Batting Handedness

bowler_handedness_matchup.csv

How every IPL bowler performs against left-handers vs right-handers. Reveals asymmetric matchups — bowlers who dominate one handedness but struggle against the other. Critical for batting order optimization.

Why this matters
Teams often alternate LHB/RHB in the order to disrupt bowling lines. This data quantifies exactly how much advantage each switch provides.
📊 View Data →
Venue
🏟️

Team Venue Records

team_venue_records.csv

Win/loss records for every team at every IPL venue, with year-wise breakdown. Identifies home fortress effects, away vulnerabilities, and neutral venue performance patterns across 2023-2025.

Why this matters
Venue is cricket's most underrated variable. Some teams have 80%+ home win rates but drop to 30% away. Park factors directly influence team ratings.
📊 View Data →

🔥 Pressure & Phase Performance

Pressure
💪

Bowler Pressure Sequences

bowler_pressure_sequences.csv

Tracks bowler performance under pressure — economy and strike rate in death overs, when defending small totals, and in consecutive dot-ball sequences. The cricket equivalent of "clutch" performance.

Why this matters
PFF grades NFL quarterbacks on 4th-quarter performance. Similarly, a bowler's death overs economy under pressure is more predictive than their overall economy.
📊 View Data →
Phase
📋

Bowler Phase Distribution

bowler_phase_distribution_grouped.csv

How bowlers distribute their overs across powerplay, middle, and death phases. Grouped analysis reveals captaincy patterns — which bowlers are trusted at death, which are powerplay-only, and who bowls through all phases.

Why this matters
A bowler's role in a team is defined by when they bowl. Phase distribution directly feeds into our DEATH_SPECIALIST, NEW_BALL_SPECIALIST, and WORKHORSE tags.
📊 View Data →
Pressure
🎯

Batter Pressure Bands

batter_pressure_bands.csv

How every IPL batter performs across low, medium, and high pressure bands (2023-2025). Segments strike rate, boundary percentage, and dot ball frequency by match situation to reveal who thrives and who wilts under pressure.

Why this matters
Overall averages flatten out pressure context. A batter averaging 140 SR may drop to 110 in high-pressure situations. This data separates genuine clutch performers from flat-track bullies.
📊 View Data →
Pressure
📊

Pressure Performance Ratings

pressure_deltas.csv

Composite pressure performance ratings for batters and bowlers. Measures the delta between overall and pressure-situation strike rates, boundary percentages, and dot ball rates. Assigns CLUTCH, STEADY, or FADES ratings based on statistical thresholds.

Why this matters
The PFF approach applied to cricket: context-aware grading over raw stats. A bowler who improves by 15% under pressure is more valuable than one who merely maintains. These ratings directly inform Predicted XII selection weights.
📊 View Data →
Glossary
📖

Pressure Performance Glossary

Reference guide for all pressure metrics and ratings
Expand glossary
Pressure Bands (RRR-Based)
COMFORTABLE< 8Cruising — run rate is manageable, batters can play normally
BUILDING8 – 10Above par — scoring needs to accelerate, risk-taking begins
HIGH10 – 12Aggressive required — boundaries needed every 2-3 balls
EXTREME12 – 15Six-hitting territory — almost every ball must score
NEAR_IMPOSSIBLE15+Miracle needed — requires continuous boundaries to win
Pressure Ratings (Performance Tags)
CLUTCHBatters: SR improves 10%+ AND dot% drops in 12+ RRR bands | Bowlers: Economy improves AND dot% rises
PRESSURE_PROOFMetrics within ±5% of overall across all bands (batters and bowlers alike)
MODERATEPerformance changes between 5-10% under pressure for both roles
PRESSURE_SENSITIVEBatters: SR drops 10%+ OR dot% rises 10%+ | Bowlers: Economy rises 15%+ OR boundary% conceded rises 10%+
FINISHERBatter only — SR in 15+ band exceeds 170 with adequate sample
CLOSERBowler only — Economy < 8.5 in 15+ band with 5+ overs
Entry Context (Batter Only)
FRESH< 10 ballsWalked in during pressure phase — facing it cold
BUILDING10 – 25Getting set when pressure hit — partially established
SET25 – 40Well established before pressure phase
DEEP_SET40+Long innings before pressure — fully in rhythm
Other Terms
Weighted Score (W.Score)Composite metric combining SR delta with sample size (log₂ scaling) and death overs bonus (30% weight for overs 16-20 execution)
SR DeltaPercentage change in strike rate between overall performance and pressure situations. Positive = better under pressure.
Death Pressure RatioProportion of pressure balls faced in overs 16-20 vs all pressure balls

⚙️ Algorithm Documentation

Algorithm
🎯

SUPER SELECTOR Algorithm v2

predicted_xii_algorithm_v2.md

Complete specification of our Predicted XII algorithm. Covers constraint satisfaction (overseas limits, balance requirements), scoring weights, impact player selection, and tie-breaking rules.

Why this was done
The Predicted XII is our flagship output. This doc ensures the algorithm is reproducible, auditable, and can be improved systematically.
📄 View on GitHub →
Domain
🌸

Andy Flower Validation v2

andy_flower_v2_validation.md

Domain expert review of our clustering model outputs. Andy Flower's validation of player archetypes and recommendations for threshold adjustments based on cricket expertise.

Why this was done
Data-driven models need domain validation. Andy's cricket expertise ensures our clusters map to real playing styles, not just statistical patterns.
📄 View on GitHub →

🌍 Tournament Intelligence

PQI (Player Quality)
Eff. CI (Competition)
Recency
Conditions
Sample Size
# Tournament Weight Tier PQI Eff. CI Recency Conditions Sample Matches
Methodology
Tournament weights determine how much non-IPL performance data is trusted when building player profiles. Each tournament is scored on 5 factors: PQI (25%), Effective CI (20%), Recency (20%), Conditions Similarity (15%), and Sample Confidence (20%). The composite weight is the geometric mean of all factors (weighted). Recency uses a 4-year half-life decay. Conditions similarity is benchmarked against IPL 2023-2025 as the baseline. Tier assignments: 1A (0.80+), 1B (0.60-0.79), 1C (0.45-0.59), 2 (<0.45).