Data and simulator
Methodology
Last updated: 2026-06-20
This page exists so the site is auditable. Every piece of data shown on Sete a Zero comes from one of the sources listed below, and every number the simulator produces follows one of the rules we describe. We do not use language models (LLMs) to generate match commentary or to invent statistics.
1. Squad sources
Each World Cup squad (22 players with shirt number and position) comes from two public sources:
- Wikipedia – the FIFA World Cup squads pages for each edition from 1950. They let us cover the 2026 World Cup before formal databases freeze it. CC-BY-SA 4.0.
- Fjelstul World Cup Database – structured CSV by Joshua C. Fjelstul (CC-BY-SA 4.0, github.com/jfjelstul/worldcup). Covers 1930 to 2022. We use it to validate names, numbers, and positions against Wikipedia.
When the two sources disagree (a name spelling, a number changed after a last-minute injury) we prefer Fjelstul because it is structured. If the difference affects a household name, we keep the version most recognizable in Spanish (for example “Pelé” instead of “Edson Arantes”).
2. How the ratings are calibrated
The 65-to-96 ratings are a manual calibration, not an automatic aggregation. This is transparent on purpose: no database assigns an objective number to Pelé in 1958. What we do:
- Real achievements axis: Ballon d'Or and World Cup Golden Boot anchor the upper bound. A Ballon d'Or winner sits at 90-95; a multiple winner at 94-96.
- World Cup All-Star: the official tournament best-XI fixes the 88-92 range within that specific cup.
- Tournament performance: champion → +1-2 to the key player; losing finalist → +0-1; quarters → no bonus. This stops good players from early-out squads from being inflated.
- SoFIFA for players from 2014+: for World Cups from 2014 onward we use sofifa.com (overall published by EA Sports) as a cross-reference, adjusting up to ±3 points based on tournament performance.
We do not use historical ELO or the FIFA Ranking inside the individual rating: those are national-team metrics, not player metrics. We do look at them when calibrating opponent strength per round (section 4).
3. How the result is computed
Each match is deterministic from a random seed stored in the browser. Same XI plus same seed equals same result. The main formulas:
- Attack = 0.55 × forwards avg + 0.35 × midfield avg + 0.10 × defence avg.
- Defence = 0.35 × keeper + 0.45 × defence avg + 0.20 × midfield avg.
- Overall = weighted average across the eleven (4.4 units), with extra weight on defence and midfield.
- Form = ±14% factor per match. This is what lets an inferior opponent steal points in the group stage.
- Home advantage = +0.10 on effective rating. Applies only when the opponent is at its native World Cup (Mexico 2026 playing in Mexico).
4. Difficulty per round
The simulator picks opponents by tier:
- Groups 1-2: tier 1 (national teams rated 65-76, debutants and early outs).
- Group 3 / Round of 16: tier 2 (76-83, second chances).
- Quarter / Semi: tier 3 (85-89, historical semifinalists).
- Final: tier 4 with thirteen legend squads (94-97).
The biggest drop-off points are quarter and final: the simulator does not invent opponents, it uses real national teams (Italy 1990, Croatia 2018, etc.) with their calibrated ratings.
5. Special events
To avoid flat matches, the simulator injects twelve rare events with calibrated probabilities:
- Common (3-8% per match): VAR, star injury, GOAT moment, own goal, heavy rain.
- Rare (<2%): controversial referee, keeper red card, match fixing, fan invasion, dog on pitch, lightning storm, neutral final by war.
The goal is for a typical seven-match run to have 1-2 common events and, with luck (31% cumulative probability), one rare event. Each event has mechanical consequences (a disallowed goal, defence -15, etc.) and an attached commentary line.
6. What we decided not to do
- No LLM narration: every commentary line is written by hand in six languages with three variants to avoid repetition. LLM cost and latency are not worth it in a deterministic simulator.
- No player photos or official crests: we use emoji flags and country abbreviations. This avoids copyright issues and lets the site work in any region.
- No global historical ranking: 99% of players would never appear on it, which would punish return visits. We do publish the daily ranking, where the entry threshold is reachable.
- We never sell personal data: not the optional email from the feedback form, not anything. Full detail in /privacy.
7. Known limitations
- Historical ratings (before 2014) carry a subjective error margin of ±3 points.
- The simulator does not model in-match substitutions. Once the XI is chosen, they play the full 90 minutes.
- We do not differentiate peak-year player vs veteran player within the same World Cup; the rating applies to the player at that specific tournament.