The original prediction model used 40 features derived from Elo ratings, recent form, goal averages, head-to-head records, and match context. These features hit a ceiling around 57-58% accuracy on a time-based split.
We designed 5 new feature families (52 additional features) to break through that ceiling. Each family targets a specific blind spot in the original feature set. The combined 92-feature model reaches ~60-61% accuracy, a 3-4 percentage point improvement that translates to meaningfully better tournament predictions.
This document explains each feature family in depth: what it measures, how it’s computed, and why it helps.
Two datasets feed the entire pipeline:
The goalscorers table is what makes 3 of the 5 new feature families possible. The original notebook ignored it entirely.
Every feature is extracted from the pre-match state of its tracker. After features are recorded, the trackers update with the match result. This prevents data leakage (the model can’t “see” future results during training).
for each match (chronologically):
1. Extract all 92 features from current tracker states
2. Record features + result as one training row
3. Update all trackers with this match's result
This means early matches in the dataset have sparse features (few prior matches to compute form from). We handle this with sensible defaults: form defaults to 0.5 (neutral), goal averages default to 1.5, H2H defaults to 0.5 win rate.
For context, here’s what the baseline already captured:
Elo ratings (8 features): Global and tournament-specific ratings, rating differences, expected score. See elo-system.md.
Form (11 features): Win rates over last 5, 10, and 20 matches. Exponentially-weighted form (decay=0.9) that values recent results more. Differentials between home and away team form.
Goal statistics (8 features): Average goals scored and conceded over last 10 matches. Goal difference averages. Attack-vs-defense matchup (home attack vs away defense).
Head-to-head (3 features): Historical win rate, total meetings, and average goal difference between the specific pair of teams.
Match context (10 features): Neutral venue flag, home advantage flag, tournament type (World Cup / continental / friendly), days of rest for each team, rest differential, and total international experience (match count) for both teams.
These 40 features are solid. They capture the two biggest predictors of match outcomes: team quality (Elo) and recent trajectory (form). The 5 new families target what’s left.
What it captures: How a team scores, not just how much. Two teams averaging 1.5 goals per game can look identical in the baseline features but have completely different scoring profiles.
Tracker: GoalscorerTracker processes the goalscorers.csv data, maintaining a rolling list of the last 50 goals per team with minute, penalty flag, own-goal flag, and scorer name.
| Feature | Formula | What it reveals |
|---|---|---|
home_scoring_depth |
unique scorers / total goals | Teams with 8 different scorers in their last 50 goals are harder to defend against than teams relying on 2-3 strikers |
away_scoring_depth |
(same, for away team) | |
scoring_depth_diff |
home - away | Positive means the home team has more distributed scoring |
home_star_dependency |
top scorer’s goals / total goals | High dependency = vulnerability if the star is marked or injured |
away_star_dependency |
(same, for away team) | |
home_penalty_ratio |
penalty goals / total goals | Teams that score heavily from penalties may be flattering their goal stats |
away_penalty_ratio |
(same, for away team) | |
home_late_goal_ratio |
goals in minute 75+ / total goals | Late-scoring teams are mentally resilient and physically fit |
away_late_goal_ratio |
(same, for away team) | |
late_goal_diff |
home - away | |
home_first_half_ratio |
goals in minute 1-45 / total goals | Teams that score early can control the game through possession |
away_first_half_ratio |
(same, for away team) |
The goalscorers dataset stores minutes as strings like "45+2" or "90+3". The parser handles all formats:
def parse_minute(self, minute_str):
if '+' in minute_str:
parts = minute_str.split('+')
return int(parts[0]) + int(parts[1]) # "45+2" -> 47
return int(float(minute_str))
Scoring depth turned out to be one of the most informative new features. Teams with distributed scoring (depth > 0.6) win at higher rates than star-dependent teams (depth < 0.3) in knockout situations. The model learns this interaction between scoring depth and tournament stage.
What it captures: Streaks, mental resilience, defensive solidity, and behavioral tendencies. Football is as much a mental game as a physical one, and momentum effects are real.
Tracker: MomentumTracker stores the last 15 results per team as (points, goals_for, goals_against, conceded_first) tuples.
| Feature | Formula | What it reveals |
|---|---|---|
home_streak |
Consecutive wins from most recent match backward | Confidence and form (but also regression risk) |
away_streak |
(same, for away team) | |
streak_diff |
home - away | |
home_unbeaten |
Consecutive non-losses from most recent match | More stable than win streak, captures draw-prone teams |
away_unbeaten |
(same, for away team) | |
home_clean_sheet_pct |
Matches with 0 goals conceded / last 15 | Defensive reliability |
away_clean_sheet_pct |
(same, for away team) | |
home_comeback_rate |
Wins after conceding first / matches where conceded first | Mental toughness. Teams that comeback frequently are dangerous underdogs |
away_comeback_rate |
(same, for away team) | |
home_draw_tendency |
Draws / last 15 matches | Some teams structurally draw more (defensive style, closely-matched opponents) |
away_draw_tendency |
(same, for away team) | |
draw_tendency_sum |
home + away | When both teams are draw-prone, the draw probability spikes |
home_blowout_win_pct |
Wins by 3+ goals / last 15 | Ability to dominate weaker opponents |
away_blowout_loss_pct |
Losses by 3+ goals / last 15 | Vulnerability to collapse |
home_shutout_loss_pct |
Losses where team scored 0 / last 15 | Complete offensive failure rate |
away_shutout_loss_pct |
(same, for away team) |
To compute comeback rate, we need to know which team conceded first. This requires parsing the goalscorers data to find the earliest goal in each match:
def determine_conceded_first(match_goals, team, gf, ga):
if not match_goals or gf == 0:
return ga > 0 # Didn't score but conceded = conceded first
earliest_minute = 999
earliest_team = None
for g_team, scorer, minute_str, own_goal, penalty in match_goals:
parsed = GoalscorerTracker().parse_minute(minute_str)
if parsed is not None and parsed < earliest_minute:
earliest_minute = parsed
earliest_team = g_team
return earliest_team != team # Conceded first if opponent scored first
The draw_tendency_sum feature is the standout. When two draw-prone teams meet, the model correctly shifts probability mass toward the draw outcome, which the baseline features couldn’t do. The comeback_rate feature also adds signal for knockout predictions, where trailing teams either crumble or fight back.
What it captures: Statistical goal expectation based on Poisson modeling. Instead of just averaging goals scored, this family models the actual probability distribution of scorelines.
Tracker: PoissonTracker maintains per-team lists of goals scored and conceded per match (last 20 matches).
Football goals per team per match approximately follow a Poisson distribution. Given an expected goals rate (lambda), the probability of scoring exactly k goals is:
P(k goals) = (lambda^k * e^(-lambda)) / k!
We estimate lambda for each team in each match by averaging their attack strength with the opponent’s defensive weakness:
home_lambda = clip((home_scored_avg + away_conceded_avg) / 2, 0.3, 5.0)
away_lambda = clip((away_scored_avg + home_conceded_avg) / 2, 0.3, 5.0)
Clipping to [0.3, 5.0] prevents extreme values from teams with very few matches.
From the two lambdas, we build a probability matrix over all plausible scorelines (0-0 through 7-7):
h_pmf = poisson.pmf(range(8), home_lambda) # P(home scores 0,1,2,...,7)
a_pmf = poisson.pmf(range(8), away_lambda) # P(away scores 0,1,2,...,7)
prob_matrix = np.outer(h_pmf, a_pmf) # 8x8 joint probability matrix
home_win_prob = np.tril(prob_matrix, -1).sum() # Below diagonal = home wins
draw_prob = np.trace(prob_matrix) # Diagonal = draws
This is the same approach used by betting markets and expected-goals models like FiveThirtyEight’s.
Computing Poisson PMFs is expensive when done 48,943 times. We cache PMF vectors by lambda (rounded to nearest 0.1):
@classmethod
def _get_pmfs(cls, lam):
key = round(lam * 10)
if key not in cls._PMF_CACHE:
cls._PMF_CACHE[key] = poisson.pmf(cls._GOALS_RANGE, lam)
return cls._PMF_CACHE[key]
| Feature | Description |
|---|---|
home_lambda |
Expected goals for home team |
away_lambda |
Expected goals for away team |
home_poisson_win |
Poisson-derived home win probability |
home_poisson_draw |
Poisson-derived draw probability |
home_scoring_variance |
Variance in home team’s goals scored (consistency measure) |
away_scoring_variance |
Variance in away team’s goals scored |
home_overperformance |
Actual win rate minus Poisson-predicted win rate |
away_overperformance |
(same, for away team) |
The overperformance features are the key contribution. A team with high overperformance consistently wins more than their raw goal numbers suggest, indicating quality finishing, game management, or luck. The Poisson win/draw probabilities also provide a second “opinion” on match outcome that partially decorrelates from the Elo prediction, giving the ensemble more to work with.
What it captures: Physical and structural factors of where the match is played.
| Feature | Source | Description |
|---|---|---|
altitude |
CITY_ALTITUDES lookup table |
Elevation in meters. Mexico City (2240m) and Bogota (2640m) significantly affect team stamina |
is_high_altitude |
altitude > 1500m | Binary flag for high-altitude venues |
same_confederation |
TEAM_CONFEDERATIONS mapping |
Whether both teams are from the same confederation |
confed_strength_diff |
CONFED_STRENGTH tier values |
Gap in confederation historical strength (UEFA=1.0, CONMEBOL=0.95, CONCACAF=0.6, AFC/CAF=0.5, OFC=0.3) |
is_intercontinental |
confederations differ | Intercontinental matches tend to be more unpredictable |
We maintain a dictionary of 26 major football cities with known altitudes:
CITY_ALTITUDES = {
'mexico city': 2240, 'bogota': 2640, 'quito': 2850, 'la paz': 3640,
'johannesburg': 1753, 'addis ababa': 2355, 'nairobi': 1795, 'denver': 1609,
'madrid': 667, 'sao paulo': 760, 'guadalajara': 1566, 'monterrey': 540,
'atlanta': 320, 'dallas': 131, 'houston': 15, 'kansas city': 247,
'los angeles': 30, 'miami': 2, 'new york': 3, 'philadelphia': 12,
'san francisco': 16, 'seattle': 54, 'toronto': 76, 'vancouver': 0,
}
Cities not in the table default to 100m. The 2026 World Cup venues (across the US, Mexico, and Canada) are all included.
Based on historical World Cup performance:
CONFED_STRENGTH = {
'UEFA': 1.0, # 12 of 22 World Cup winners
'CONMEBOL': 0.95, # 10 of 22 World Cup winners
'CONCACAF': 0.6, # Semi-finals ceiling (USA 1930, Mexico multiple QFs)
'AFC': 0.5, # South Korea 2002 semi-final is the high-water mark
'CAF': 0.5, # Cameroon 1990, Ghana/Senegal QFs
'OFC': 0.3, # New Zealand's only WC wins are draws
}
The smallest individual contribution. The Elo system already implicitly captures some geographic effects (teams that play at altitude accumulate rating points from altitude-assisted home wins). The is_intercontinental feature adds marginal value by flagging matches where teams from different football cultures meet, which historically produce more upsets.
What it captures: How teams perform in different competitive contexts. Some teams consistently overperform in World Cups. Others fold under pressure.
Tracker: TournamentTracker classifies each match into one of 5 contexts and maintains per-context result histories for every team.
def classify_stage(self, tournament, date):
t = tournament.lower()
if 'friendly' in t: return 'friendly'
if 'qualification' in t: return 'qualifying'
if 'fifa world cup' in t: return 'wc_finals'
if any(x in t for x in ['euro', 'copa', 'asian cup', 'gold cup', 'african cup']):
return 'continental_finals'
return 'other_competitive'
| Feature | Formula | Description |
|---|---|---|
home_wc_form |
Win rate in last 20 WC finals matches | Raw World Cup pedigree |
away_wc_form |
(same, for away team) | |
wc_form_diff |
home - away | |
home_competitive_form |
0.4 * wc_form + 0.3 * continental_form + 0.3 * qualifying_form | Blended competitive performance |
away_competitive_form |
(same, for away team) | |
home_big_game_factor |
competitive_form - friendly_form | Positive = rises to the occasion. Negative = “friendly bully” |
away_big_game_factor |
(same, for away team) | |
big_game_diff |
home - away | |
home_wc_experience |
Total WC finals matches played | Germany (112 WC matches) vs Bahrain (0) |
away_wc_experience |
(same, for away team) | |
wc_experience_diff |
home - away |
This is the most novel feature in the family. It measures the gap between a team’s competitive and friendly performance:
big_game_factor = competitive_form - friendly_form
A team with competitive_form = 0.7 and friendly_form = 0.5 has a big_game_factor of +0.2, meaning they step up when it matters. A team with the reverse profile (0.5 competitive, 0.7 friendly) has a negative big_game_factor: they beat weaker teams in friendlies but underperform against real opposition.
World Cup experience and the big-game factor both carry genuine predictive power for tournament matches specifically. The model learns that a team with 50+ WC matches and a positive big-game factor is more likely to advance than their Elo alone would suggest. This is particularly valuable for the 2026 predictions, where every match is a “big game.”
Processing 44,568 goals per match during the chronological loop would be painfully slow. Instead, we pre-build an index keyed by (date, home_team, away_team):
def build_goalscorer_index(gs_df):
index = defaultdict(list)
for row in gs.itertuples(index=False):
key = (row.date, row.home_team, row.away_team)
index[key].append((row.team, row.scorer, row.minute, row.own_goal, row.penalty))
return index
During the feature loop, looking up goals for a specific match is a single dict access: gs_index.get((date, ht, at), []). This reduces the total processing time from minutes to seconds.
| Family | Features | Source Data | Key Insight |
|---|---|---|---|
| Elo ratings | 8 | results.csv | Long-term team quality |
| Form / goals | 19 | results.csv | Recent trajectory |
| Head-to-head | 3 | results.csv | Pair-specific history |
| Match context | 10 | results.csv | Venue, tournament type, rest, experience |
| Goalscorer intelligence | 12 | goalscorers.csv | How teams score, not just how much |
| Momentum / psychology | 16 | results.csv + goalscorers.csv | Streaks, resilience, tendencies |
| Poisson expected goals | 8 | results.csv | Statistical goal modeling |
| Venue / geography | 5 | results.csv + lookup tables | Altitude, confederation dynamics |
| Tournament context | 11 | results.csv | Big-game performance patterns |
| Total | 92 |
The 52 new features (bold rows) collectively improve accuracy by 3-4 percentage points over the 40-feature baseline. More importantly, they provide richer probability estimates for the 2026 World Cup simulation, where small differences in predicted win probabilities cascade through 7 knockout rounds to produce materially different bracket outcomes.