OpenBeta Tick List aka Project Roadmap

What we want to accomplish this year

Jul 25, 2022

Climbers often keep To-do lists of climbs as a way to track progress and motivate ourselves. I want to share with you my list for OpenBeta (ranging from medium difficulty to V-hard kind of problem 😀) that I hope to achieve by the end of this year and early 2023.

The data set from MountainProject contributors has given us a great jump start. Thank you! As the climb catalog user base grows, I’d like to enlist your help to improve the data in the following key areas:

Identify and tag major climbing areas a “destination”.
Assign a 2 to 5-letter short code, similar to Airport code, to those destinations.
Identify missing and inconsistent FA data.
Add new countries and new crags.

In each section I’m going to describe the existing problems then my proposed solutions. Please share your feedback in the comments box below.

Area classification

Difficulty: V-Medium

We want to answer the question, “Where are all the major climbing areas in the US?”

Data Silos Problem

Currently, thousands of climbing areas in the climb database are organized under states and some arbitrary geographical regions similar to a File-and-folder structure. For example, Nevada is divided into three regions, Eastern, Western and Southern. In everyday conversation, however, those regions are not known or used by climbers. Instead, we’d refer to an area by its popular name: “I’m going to Red Rocks” or “Mount Charleston is great in the summer”.

A sample hierarchy

Nevada
  |— Southern Nevada
       |—— Red Rocks *
             |—— Calico Basin
                     |—— Cut Your Teeth Crag
                     |—— Riding Hood Wall
                         ...
       |
       |—— Mount Charleston *
             |—— Kyle Canyon
                   |—— The Hood

A similar classification exists in other states: Smith Rock is under Southern Oregon, Indian Creek is under Southeast Utah.

US State, an unnecessary boundary

A state or a province is a political and administrative boundary that can easily be determined by a GIS lookup function given the climbing destination’s approximate latitude and longitude. For areas near state borders such as Lake Tahoe or Portland, Ore., state boundaries can hinder data aggregation. (see map below)

In OpenBeta, we’re gradually deemphasizing state boundaries and geographical regions, highlighting destinations as the data becomes available.

Major crags near Portland, Oregon, are located in Oregon and Washington state.

Solution: A Manual Tagging Approach

In the climb catalog version 0.4, we’re introducing an edit feature that allows users to tag well-known climbing areas such as Smith Rock, Red Rock, Indian Creek, Long’s Peak, Red River Gorge, The Gunks as “destination”.

Since this is the first community data cleanup project, we don’t want to place too many restrictions on what one may consider an area a “destination”.

Benefits

Having a list of major climbing destinations enables us to make better maps, improve climb searches, and a host of other GIS activities related to climbing such as weather apps.

First-Ascenionist Data Cleanup

Difficulty: V-Hard

Early generations of first ascensionists played a vital role in the history and development of climbing. Unfortunately, records of their works, if available, exist as a short byline string (list of the individuals’ names and the year) in the current dataset in no particular format.

Improving the quality of this metadata will not only enable efficient curation, but also pave the way for the creation of a route developer directory.

A sample of existing FA data field in Red Rock Canyon, Nevada

Levitation 29
Jorge and Joanne Urioste/FFA: John Long, Lynn Hill & Joanne Urioste
Crimson Chrysalis
Uriostes, 10/79
Spare Rib
George and Joanne Urioste, 1980
Voyager
P. Van Betten, J. Smith 1986
Adventure Punks
Paul Van Betten, Richard Harrison, and Sal Mamusia, 1983

As you can see names of same individuals are recorded differently:

Jorge vs George
Uriostes vs Jorge and Joanne Urioste
P. Van Betten vs Paul Van Betten
J. Smith vs Jay Smith

Dates are either missing or in various formats, 10/79 vs 1980.

Solution: Machine Learning and Manual Corrections

Fortunately, we don’t have to start from scratch. Two contributors have already come up with preliminary parsing scripts. One of scripts uses spaCy, a machine learning library for detecting human names. I think the sensible approach would be processing the FA data with code as much as we can and flag the rest for manual corrections.

I’m looking a Project lead to help tackle this initiative. Please email viet at openbeta.io if you’re interested.

OpenBeta Project

Discussion about this post