Quickly and easily Scrape FBREF using just Pandas

Paul Corcoran
Level Up Coding
Published in
3 min readMar 29, 2022

--

We love football data and fbref is one of the best open sources out there!

We are going to scrape the league tables and some other shooting statistics just to get a feel for what is possible. fbref uses html tables which is a key requirement for what we are doing today. As always the full code is here: https://github.com/PaulyCorcoran/Medium_Football/blob/main/fb%20ref.ipynb

The first step is to identify what table we want to scrape. For today’s article, I am choosing to scrape the league table for la liga located here https://fbref.com/en/comps/12/La-Liga-Stats.

Target table source: fbref

The only library we need to scrape html tables is pandas. Import the library and set the target url into the pd.read_html() function.

 import pandas as pddf = pd.read_html('https://fbref.com/en/comps/12/La-Liga-Stats')

If you allow yourself to print this df it will be very messy and not in the correct format. However, we can clean this up for extraction using a simple for loop.

for idx,table in enumerate(df):
print(“***************************”)
print(idx)
print(table)

Here we are asking python to loop through the df and print the index and table it possesses. As you can see below the index starts at [0] in python and the table has been printed after. We want to extract the table at index[0] this can be done by simply slicing the df[0].

Which leaves us with the extracted table…. cool huh?

Authors Notebook

The rest of the notebook goes through another la liga table extraction. When scraping fbref html tables the resulting df can be multilayered columns which will have an affect on any analysis or plotting. We can drop the multilayer by using df.columns.droplevel(). Work through the provided notebook to see an example.

Lastly, I pulled up the shooting stats for the league and filtered on the top 4 and plotted. This is a brief example of what can be done with football statistics scraped from fb ref. Hope you enjoy playing! Below, I have plotted the SoT figures for the current top 4. We can gain interesting insights by plotting. Sevilla are much less successful at achieving Shots on target!

SoT statistics by Top 4 29/03/2022

if you like this post check out my newest post for scraping fb ref for fixtures! https://medium.com/p/e0d8130a3dfd and give me a follow for more football content!

And if you want to see more football content from myself or other great authors here on medium — you can join up using this link to no extra cost to yourself.

In other posts, I covered an expected goals model, webscraping football statistics from the official Champions League,World Cup 2022 and French Ligue 1 sites aswell as many interesting analytics articles about football in general.

Have a great day!

--

--

Football Analytics⚽️ | Sports Betting | Python🐍 | R | Machine Learning | Twitter: 👉 bit.ly/3zmDbOh