Pandas 101: Visualize rock climbing data
Build a simple climbing map with Python Pandas and GeoPandas
In this tutorial I’ll show how to use two powerful Python libraries, Pandas and GeoPandas to build a map of all climbing routes in the US.
In case you’re not a subscriber of OpenBeta project mailing list, we’ve recently published all USA climbing routes on our GitHub repo.
This tutorial assumes you’re familiar with Python3, Jupyter notebook, and have some rudimentary knowledge of data analysis with Pandas.
1. Loading the dataset
Manually download openbeta-usa-routes-mmm-yyyy.zip
from our GitHub repo or get it directly with curl:
curl -O https://github.com/OpenBeta/climbing-data/-/raw/master/openbeta-usa-routes-aug-2020.zip
Install supporting Python 3 dependencies:
# we opt for pipenv, but feel free to use virtualenv and pip
pipenv install pandas geopandas matplotlib shapely
Load the data file in Jupyter notebook
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
from shapely.geometry import Point
# load .zip file directly
df = pd.read_json("./openbeta-usa-routes-aug-2020.zip", lines=True)
# this is a magic command for showing the map in Jupyter notebook
%matplotlib inline
Let’s have a quick look at our data
2. Get each climb’s coordinates
The metadata column contains extra information about the climb such as its latitude, longitude, the crag it belongs to, and url to cross-reference with the respective page on MountainProject.com.
Since metadata’s content is a JSON string, attempt to access the inner dictionary directly would fail:
df['metadata']['parent_lnglat']
KeyError: 'parent_lnglat'
Convert JSON dictionary into Pandas columns
We use pandas.json_normalize() to convert JSON dictionary into Pandas columns.
df=df.join(pd.json_normalize(df['metadata']).add_prefix("metadata.")).drop(['metadata'], axis=1)
Build a list of coordinates
The coordinates field is now accessible as a Pandas column: df['metadata.parent_lnglat']
# convert [lng, lat] into shapely.geometry.Point
geometry = [Point(tuple(xy)) for xy in df['metadata.parent_lnglat']]
3. Plot the map
geo_df = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)
ax = geo_df.plot(figsize=(20, 20), alpha=0.5, edgecolor='k')
You did it! You can make out the shape of the United States from all the data points. I’ll leave it as an exercise for you to display state boundaries from shape files or overlay the data on a base map.
Complete Jupyter notebook
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
df = pd.read_json("./openbeta-usa-routes-aug-2020.zip", lines=True)
%matplotlib inline
df=df.join(pd.json_normalize(df['metadata']).add_prefix("metadata.")).drop(['metadata'], axis=1)
geometry = [Point(tuple(xy)) for xy in df['metadata.parent_lnglat']]
geo_df = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)
geo_df.plot(figsize=(20, 20), alpha=0.5, edgecolor='k')