Using AI to Analyze Climbing Route Descriptions
Some interesting things to do with common NLP methods and a database of route descriptions.
What you will (hopefully) read here
This article describes a couple of cool things you can do (and I did) with natural language processing (NLP) techniques and a database of route descriptions. First, I give some minimal background on the NLP techniques used to produce the results herein. Next, I show how a route description can be used to produce a numerical route profile. For example: how pumpy a route is likely to be (on a scale of 0 to 1) based on the words used in the description. Third, I demonstrate a route search method in which you can enter a short hypothetical route description and get back a list of real routes with similar descriptions (it works pretty well some of the time). Finally, I present some guidelines on how I think routes should be described in an online guidebook. Note that a key assumption underlying this article is that route descriptions in online guidebooks are mostly good (i.e. actually descriptive). This assumption could be bunk, so keep that in mind. All the code used to generate the results described below can be found here.
Some background on natural language processing
NLP is a subfield of AI that deals with getting computers to “understand” human language. Here is a good overview that is (coincidentally) the first result when I google “natural language processing”. I use two common NLP techniques below: Word2vec and Doc2vec. A Word2vec model converts words to vectors. You can think of these vectors as coordinates in N-dimensional space (usually N > 50). The vector conversion is accomplished using a specific neural network architecture trained on a large collection of text (a corpus). The vector representation of words in a corpus with similar meanings will be close, literally: the distance between the vectors will be relatively small. Further, any functions that can be applied to vectors (addition, averaging, rotation, etc.) can then be applied to words, often with semantically meaningful results. A Doc2vec model works similarly to a Word2vec model, but converts “documents” (sentences or paragraphs) to vectors, rather than words.
Building climbing-specific Word2vec and Doc2vec models
There are many high-quality, pre-trained English Word2vec and Doc2vec models built with massive corpora available for free on the internet (start here). The problem with applying these directly to text about rock climbs is that climbers continually spout streams of nonsense (chossy, beta, pumpy, runout, and on and on) when talking (and writing) about climbs. Such nonsense would make little sense (ha) to a model trained using “typical” English. Luckily, we have a quality corpus to train climbing-specific models available on the OpenBeta GitHub. This database includes route and boulder descriptions scraped from Mountain Project, covering most of the U.S. These descriptions can be used to produce models that “understand” climbing jargon (which I did). My climbing-specific Word2vec and Doc2vec models can be found here, along with the training scripts and data.
Here are some instructive word similarities (cosine similarity, for those who are interested) we can get from the Word2vec model, the similarity score for each word (0 = no similarity, 1 = highest possible similarity) is shown in parenthesis:
Three words most similar to “chossy” in the training corpus:
junky (0.90),
crumbly (0.88),
flaky (0.87)
Three words most similar to “beta”:
method (0.75)
body positioning (0.68)
figure out (0.64)
Three words most similar to “crack” + “technique”:
fist jams (0.76)
off width (0.74)
jamming (0.73)
Note that I trained the Word2vec model on single words, bigrams, and trigrams. Bigrams/trigrams are common two/three-word phrases that are treated as a single word in the model, hence why we see some phrases in the above results.
Route profiles from text descriptions using Word2vec
I wanted to develop some interesting ways to visualize routes based on their descriptions using my Word2vec model, and the first thing I thought of was to profile routes by hold types. The idea is this: for each hold type (jug, crimp, etc.) calculate the distance of each word vector in a route description to the word vector for that hold type, then aggregate these distances to get an overall similarity score. I thought this would work much better than direct pattern matching since Wordvec models easily handle synonyms, tense changes, etc (e.g. “bucket” is near to “jug”). However, this didn’t work too well as different hold types are usually used in the same context in route descriptions. To clarify, you frequently see “grab the crimp” or “grab the jug” (and many other similar variations of this phrase), so the Word2vec model primarily learns that a word is a hold type, not the differences between hold types. For example, here are the five most similar words to “jug” in the training corpus:
hold (0.84)
hueco (0.82)
undercling (0.82)
sloper (0.81)
finger bucket (0.81)
you can see that “sloper” is high on the list, despite not being considered a jug-type hold. Likewise, “pinch” and “crimp” have a similarity score of 0.88 (quite high). However, general route descriptors such as technical, powerful, fun, chossy, etc. are not often used in the same context. And, after some testing, I was able to get some interesting route profiles using words such as these as categories. Here is an example comparing the profiles calculated using a set of nine general route descriptors for two very different routes:
![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3f08d6f8-6edf-4b9d-acbb-9494daad0bfb_3000x2000.png)
The online description for Double Stout is:
Technical and insecure steep slab and face climbing. Start with the first few clips over a small roof of Black and Tan, and continue straight up the golden brown face. The sustained crux lasts for several more clips (preferably pre-hung). Stopping short at the anchor of Casual Gods below the roof offers an excellent 5.13- tick in itself.
The online description for Sonic Youth is:
This is one of Clear Creek's best climbs. The route ascends the steep ceiling and finishes with an awesome, overhung dihedral. The route used to be rated 12d, so maybe the correct rating is 12d/13a. Either way this route will challenge you. The initial moves are powerful, but solid, while the final crux requires endurance and good body tension. There are some classic moves on the route that will leave you smiling. A couple moderate rests can be found to help with the redpoint effort. It is highly recommended.
You can see how the numerical profiles fit the corresponding description for these routes. Double Stout is primarily technical and sustained, whereas Sonic Youth has multiple powerful/tensiony cruxes separated by rests. For the specifics of how the descriptor scores used to generate Figure 1 were calculated please see this Jupyter notebook.
Description based route searches using Doc2vec
Suppose you want to find some routes for potential projects, or build a vacation to-do list, but you really only want routes with dynos, because you are cool. You could search through Mountain Project page by page looking for the word dyno, but this is time-consuming, annoying, and (so they tell me) will suppress your melatonin production, making your chances of sticking a dyno pretty low due to sleep deprivation. To further complicate things, a dyno could be described as a jump, huck, toss, throw, or dynamic move (to name but a few). However, there is a simple solution, using a Doc2vec model (to reiterate, Doc2vec is essentially analogous to Word2vec but converts sentences or paragraphs to vectors, rather than individual words):
Convert a database of route descriptions to vectors.
Write a description of your ideal route.
Convert your written description to a vector.
Find the description vectors in the database closest to that of your ideal route (yielding descriptions with similar meanings).
This is all relatively straightforward with the Python package Gensim, plus I already did 1, 3, and 4, so you just have to write the route description. Here are some example searches, the results are the routes corresponding to the top three closest descriptions to the query phrase.
Query: “Clean vertical face with small crimps“
Results:
Monkey Wrench (5.10c) | “Steep edges up a clean face. As good as any route on the 45 Wall.”
Sparrow Slab (V2) | “A very clean face is nearly featureless, aside from just enough small and sharp edges.”
Drink Up Buttercup (5.11b) | “Steep climbing on larger holds leads to a clean vertical face.“
Query: “Enduro climbing on jugs“
Results:
Phantom Pain (5.11c) | “Make long pulls on pretty good holds, working up a pump towards the top of the route where you surpass a bulge.”
Release the Lions (5.11c) | “Fun 5.11 enduro-climbing on positive holds.“
Atomic Rage (5.12d) | “Climb most of Atomic Gecko then traverse left into the business of Rage. Good Enduro climbing marred by the chipped pockets on Rage.”
Query: “Sustained pocket pulling, classic“
Results:
Cowgirl Paradise (5.11c) | “Good Pocket pulling. Steep & sustained.”
Ignorant Bliss (5.11b) | “Sustained pocket pulling. Excellent as part of the warmup for the other routes at the cliff.“
Blue Suede Shoes (5.11d) | “Short but sustained pocket pulling along a face/arête with spectacular views of the valley below. Quality.”
Query: “Big dyno for cool people“
Results (only the first was good):
Hut Hut Hike (V6-7) | “Cool climb with a cool dyno.“
The results are pretty reasonable for the above hypothetical query descriptions. However, there is an expanse of caveats stretching far into the distance for this method… First, there will be many false negatives (climbs that don’t show up in results even though they physically match the query) since people don’t always describe climbs well. Second, very detailed query descriptions are likely to return poor results. My experiments with the model indicate that a single sentence highlighting one or two features of the desired climb is best. Third, I think (although I am not sure) that climbs with especially long descriptions will not be in the top search results very often. This is because long descriptions are likely to have information that would not typically show up in a query (information on bolt placement, route history, etc.), this is essentially the inverse of the second caveat outlined above. Likely, there are plenty of other issues that could be discussed, but I think those are the most important three that I have identified.
The search results shown above can be reproduced using this Python script. I recommend first creating and activating a conda environment using this environment.yml, then everything should run smoothly. After setting up your environment you can get search results via the command line:
$ python route_description_search.py -d <query string> -n <# of results>
Make sure that the model file (doc2vec.model) and search data (search_data.pkl.zip) are in the same directory.
Some suggestions on how to describe rock climbs
In this section, I list some guidelines that I think should be followed when posting route descriptions online. These guidelines are not for route location or protection descriptions, which should be as detailed and accurate as possible, and in a separate section to the route description. Rather, the idea is to provide an accurate overview of the route style, quality, and rock without giving detailed beta or extraneous information.
Be accurate: Self-explanatory.
Give minimal beta: I think that hold types and moves can be described generally, but move-by-move beta should be avoided. Not everyone appreciates seeing beta before climbing a route. In the future, I think it best for online guidebooks to have separate (hidden) sections for the detailed beta.
Keep it short: Two to three sentences seem reasonable. Route history could go in a separate section if the route has an especially interesting history. Descriptions of bolt/gear placements, clipping stances, etc. should go in a separate “protection” section.
Be descriptive: Don’t just write “beta pending” or “cool route“, that isn’t helpful for anyone. I think things like major rock features (dihedral, roof, etc.), hold types, rock quality (clean, chossy, etc.), and movement style (powerful, techy, etc.) are great things to include.
This is very cool. Shared with the UC Berkeley Data Science Climbing Types. Looking forward to playing with this a bit.