🔍 Data Quality Audits
Player ID Mismatch Audit
player_id_audit_report.mdComprehensive audit identifying 15+ player ID mismatches between Cricsheet data and IPL 2026 squad files. Resolves duplicate IDs from franchise trades (MI↔LSG) and ensures data integrity across all analytics pipelines.
Entry Point Audit
entry_point_audit_report.mdAnalysis of where batters typically bat in the order (entry points). Validates our batting position classifications and ensures OPENER, MIDDLE_ORDER, and FINISHER tags align with actual match data.
📈 EDA & Threshold Analysis
Threshold EDA (2023+ Data)
threshold_eda_2023.mdExploratory data analysis that determined optimal thresholds for player tags using only 2023-2025 IPL data. Establishes baselines for SPECIALIST vs VULNERABLE, PP_BEAST vs PP_LIABILITY, and other phase-specific tags.
Baselines vs Tags Comparison
baselines_vs_tags.mdSystematic comparison of league-wide baselines against our player tag criteria. Validates that SPECIALIST and VULNERABLE thresholds are meaningful relative to overall IPL performance distributions.
📚 Research & Methodology
PFF Grading System Research
pff_grading_system_research.mdDeep-dive into Pro Football Focus's play-by-play grading methodology. Explores how context-aware evaluation and position-specific metrics can be adapted for cricket analytics.
KenPom Methodology Research
kenpom_methodology_research.mdAnalysis of KenPom's college basketball analytics — adjusted efficiency, tempo-free stats, and the "Four Factors." Foundation for our planned CricPom team rating system.
CricPom Prototype Specification
cricpom_prototype_spec_020926_v1.mdOpponent-adjusted, venue-normalized, phase-aware T20 efficiency rating system. 5-factor tournament weighting (PQI, CI, Recency, Conditions, Confidence) with sigmoid-based sample size scoring. 231 IPL 2026 players rated across 14 T20 tournaments.
Tournament Composite Weights
tkt187_final_weights.pyGeometric mean formula weighting 14 T20 tournaments across 5 factors: Player Quality Index, Effective Conditions Index, Recency decay, Conditions Similarity, and Sample Confidence. Produces per-team weighted SR, avg, economy, boundary%, and dot% composites.
Insight Confidence Framework
sigmoid_confidence()Sigmoid-based confidence scoring: 1 / (1 + exp(-0.02 × (matches - 100))). Players with 200+ matches get 95%+ confidence. Under 30 matches = low confidence. Applied to all 231 CricPom player ratings and pressure performance metrics.
Silhouette Score Validation
baseline_comparison.pyThree-metric cluster validation: silhouette score (cohesion vs separation), Davies-Bouldin index (cluster similarity), and Calinski-Harabasz index (between/within variance ratio). Tests K-means against random baseline to confirm clustering is meaningful, not noise.
Pressure Sequence Analysis
generate_momentum_data.pyConsecutive dot ball and boundary sequence analysis for bowling pressure and batting resilience profiling. Rates teams Elite/Strong/Average/Weak based on sequence length thresholds. Identifies clutch performers and choke risks via SR delta under pressure.
Player Clustering PRD
player_clustering_prd.mdProduct requirements document for our K-means clustering model. Defines the 6 batter archetypes (EXPLOSIVE_OPENER to FINISHER) and 7 bowler archetypes (PACER to PART_TIMER).
Cluster Archetypes (Creative)
cluster_archetypes_creative.mdCreative descriptions and narrative framing for each player archetype. Makes technical clusters accessible to fans through cricket storytelling and real-world player examples.
🧩 Player Pattern Recognition
Batter Consistency Index
batter_consistency_index.csvRolling consistency analysis across IPL 2023-2025. Tracks coefficient of variation in runs scored, single-digit failure rate, and form trajectory by season. Separates "reliable anchors" from "streaky match-winners."
Partnership Synergy Scores
partnership_synergy.csvMeasures how batting pairs amplify each other's performance. Synergy index compares partnership run rates against individual averages. Year-wise trends show which combinations are improving or declining.
⚔️ Matchup Intelligence
Batter vs Bowling Type
batter_bowling_type_matchup.csvHow every IPL batter performs against pace, off-spin, leg-spin, and left-arm spin. Identifies PACE_SPECIALIST, SPIN_SPECIALIST, and VULNERABLE_VS_SPIN tags based on 2023-2025 data with statistically significant sample sizes.
Bowler vs Batting Handedness
bowler_handedness_matchup.csvHow every IPL bowler performs against left-handers vs right-handers. Reveals asymmetric matchups — bowlers who dominate one handedness but struggle against the other. Critical for batting order optimization.
Team Venue Records
team_venue_records.csvWin/loss records for every team at every IPL venue, with year-wise breakdown. Identifies home fortress effects, away vulnerabilities, and neutral venue performance patterns across 2023-2025.
🔥 Pressure & Phase Performance
Bowler Pressure Sequences
bowler_pressure_sequences.csvTracks bowler performance under pressure — economy and strike rate in death overs, when defending small totals, and in consecutive dot-ball sequences. The cricket equivalent of "clutch" performance.
Bowler Phase Distribution
bowler_phase_distribution_grouped.csvHow bowlers distribute their overs across powerplay, middle, and death phases. Grouped analysis reveals captaincy patterns — which bowlers are trusted at death, which are powerplay-only, and who bowls through all phases.
Batter Pressure Bands
batter_pressure_bands.csvHow every IPL batter performs across low, medium, and high pressure bands (2023-2025). Segments strike rate, boundary percentage, and dot ball frequency by match situation to reveal who thrives and who wilts under pressure.
Pressure Performance Ratings
pressure_deltas.csvComposite pressure performance ratings for batters and bowlers. Measures the delta between overall and pressure-situation strike rates, boundary percentages, and dot ball rates. Assigns CLUTCH, STEADY, or FADES ratings based on statistical thresholds.
Pressure Performance Glossary
Reference guide for all pressure metrics and ratings⚙️ Algorithm Documentation
SUPER SELECTOR Algorithm v2
predicted_xii_algorithm_v2.mdComplete specification of our Predicted XII algorithm. Covers constraint satisfaction (overseas limits, balance requirements), scoring weights, impact player selection, and tie-breaking rules.
Andy Flower Validation v2
andy_flower_v2_validation.mdDomain expert review of our clustering model outputs. Andy Flower's validation of player archetypes and recommendations for threshold adjustments based on cricket expertise.
🌍 Tournament Intelligence
| # | Tournament | Weight | Tier | PQI | Eff. CI | Recency | Conditions | Sample | Matches |
|---|
PQI (25%), Effective CI (20%),
Recency (20%), Conditions Similarity (15%), and Sample Confidence (20%).
The composite weight is the geometric mean of all factors (weighted). Recency uses a 4-year half-life decay.
Conditions similarity is benchmarked against IPL 2023-2025 as the baseline.
Tier assignments: 1A (0.80+), 1B (0.60-0.79),
1C (0.45-0.59), 2 (<0.45).