StatsBomb Open Data

StatsBomb are committed to sharing new data and research publicly to enhance understanding of the game of Football. We want to actively encourage new research and analysis at all levels. Therefore we have made certain leagues of StatsBomb Data freely available for public use for research projects and genuine interest in football analytics.


You can get the complete collection of datasets from Github: https://github.com/statsbomb/open-data

Folder Structure


The data is provided as JSON files exported from the StatsBomb Data API, in the following structure:

Some documentation about the meaning of different events and the format of the JSON can be found in the doc directory, this is very useful if you just want to find a match in the free open database, since the free open database is not very large and can easily go through and search the team you want in the JSON file.


Interactive Example for Filtering Data

You can filter the JSON files of a match directly using JavaScript and make simple plots with D3.js

Here's a interactive example with code I made (based on other tutorial I found) for querying data from a specific match file, on Observable notebook: https://observablehq.com/d/7289cc87d701cc74

StatsBomb Python API

statsbombpy provides a python interface to query data from the StatsBomb collection: https://github.com/statsbomb/statsbombpy

Since some of the data need subscription to StatsBomb service to access, the best use case for the API is to query free open data which is dicussed above.

You can find the match_id of the match you want by going to the database folder manually, and look into the competition files. Once you have the match_id, it's very easy to use the API to query and aggregate event data. Below is some basic operations.


Installation

pip install statsbombpy


Basic usage

from statsbombpy import sb


To get a list of competition dataframes

sb.competitions()


To get a list of match dataframes

sb.matches(competition_id=x, season_id=y)


To get a list of lineups dataframes

sb.lineups(match_id=x)["team_name"]


To get a single dataframe of events with all event types and attributes

sb.events(match_id=x)

To get dataframe of each evet type/attribute

sb.events(match_id=x, split=True, flatten_attrs=False)["event_type"]


More examples from the README of the API

All events from a given competition can be queried and stored on a single dataframe

events = sb.competition_events(

    country="Germany",

    division= "1. Bundesliga",

    season="2019/2020",

    gender="male"

)

grouped_events = sb.competition_events(

    country="Germany",

    division= "1. Bundesliga",

    season="2019/2020",

    split=True

)

grouped_events["dribbles"]


360 frames gives you the position of players within the camera coverage at some important event.

The frame functions will return the raw 360 freeze frame data along with the visible area for each frame. This is returned at the player level so you have multiple rows per frame/event_id.

match_frames = sb.frames(match_id=3772072, fmt='dataframe')

comp_frames = sb.competition_frames(

    country="Germany",

    division= "1. Bundesliga",

    season="2019/2020"

)

match_frames