Scraping NBA Shot Chart Data from Basketball Reference

By Papa-Yaw Afari

As part of this VR basketball visualization project, we need shot-level data including shot locations, outcomes, and player context. While APIs like nba_api offer this data, they're often rate-limited or blocked. This tutorial documents a reliable alternative using web scraping to pull shot data directly from Basketball-Reference.com.

ScRIPT STYle Scrapping

There are generally two approaches to scrapping data from an online source such as Basketball Reference. You can go the general route which would involve using existing python libraries to manually scrap data through websites utilizing the requests library in python. For this section we will be focusing on script based web-scraping.

Your first thing to check is that you have all the necessary libraries installed.
1. "pip install requests beautifulsoup4 pandas"
2. "import pandas"
3. "import requests"
For my project we are looking at data of Jayson Tatum in a specific game in the playoffs that has a lot of cultural significance because of his excellence in that game. Hence, I will be comparing very comparable games from other players that also matched his 50 point-benchmark. So, I scraped all of the shot data from his game on April 15, 2023.
Choose a game url from BasketballReference.com:
1. Follow this format: https://www.basketball-reference.com/boxscores/shot-chart/YYYYMMDDTEAM.html
Open a Python File->File->Create New Python File-> and then follow the follow steps.
Script Steps:
1. #URL of the game you want to scrape
  1. url = "https://www.basketball-reference.com/.....
  2. response = requests.get(url)
  3. soup = BeautifulSoup(response.text, 'html.parser')
2. # Find all shots (divs with tooltips)
  1. shots = soup.find_all("div", class_="tooltip")
3. # Extract shots for a specific player
  1. player_name = "Jayson Tatum"
  2. results = []
  3. for shot in shots:
  4. tip = shot.get("tip")
  5. style = shot.get("style")
  6. Continued loop:
    1. if player_name in tip:

x = int(style.split('left:')[1].split('px')[0].strip())

y = int(style.split('top:')[1].split('px')[0].strip())

made = "makes" in tip

description = tip.replace("<br>", " ")

results.append({

"player": player_name,

"x": x,

"y": y,

"result": "made" if made else "missed",

"description": description

})

1. #Save to CSV
  1. df = pd.DataFrame(results)
  2. df.to_csv(f"{player_name.replace(' ', '_')}_shots.csv", index=False)

Scrape Multiple Games or Seasons Using Scrapy

1. Clone and Set Up the shotChart Scraper

For a full-season or full-playoff dataset, use the open-source shotChart project built in Scrapy. I have linked below:
- https://github.com/theccalderon/shotChart
Follow these steps:
- git clone https://github.com/theccalderon/shotChart.git
- cd shotChart
- pip install -r requirements.txt

2. Modify the Date Range

To define which dates to crawl, you edit a file called calendar.json, which tells the scraper the start and end dates for the range you're interested in. For example, if you want the 2023 playoffs, you might set:

3. Run the Scraper

"scrapy crawl basketball-reference -o shots-[XXX]-playoffs.csv -a season=[XXXX]"
The output CSV will include shot data for all players and games during the date range you specify.