Subway Route Data Extraction with Overpass API: A Step-by-Step Guide

Open Street Maps is one of the most important data sources regarding geographic information. Many of the data available on the platform can help us conduct a wide range of analyses, but how do we easily download the data for our analyses? The Overpass API allows access to all the data available on the platform through customized queries. This API serves as the foundation for the popular Python library OSMnx and, thanks to its personalized queries, it allows us to obtain more data than the Python library, which is limited to the data most frequently extracted from Open Street Maps.
In this article, we will use the API to obtain the subway routes located in Hamburg. Using these routes, we will create a NetworkX graph, which we will later on visualize using an interactive visualization in Folium. The data extracted in this article could be used for multiple analyses, such as evaluating the distance of various households to a subway station to predict their monetary value.
As we can see, geographic data is very valuable for a wide range of analyses. Therefore, knowing tools to easily extract this data is absolutely necessary. Let's start with the article!
Building Functions and Queries for Extracting Subway Line Data from OpenStreetMap with the Overpass API
The Overpass API allows us to easily extract all the information we observe in OpenStreetMap. This API is completely free of charge and allows access to all the information available on the OpenStreetMap platform with the appropriate query. This makes obtaining geographic information for our analyses easy, even on a large scale, as the API call rate limits are minimal.
As a first step, we need to design a generic function that allows us to obtain information from the OpenStreetMap website with the Overpass API using a customized query, which we will adjust according to our search interests. The following function serves this purpose, taking the query as an input parameter, where we specify the information we want to retrieve from OpenStreetMap, and returning the API response in JSON format with the required data.
The next step consists of elaborating an appropriate query capable of providing access to information of interest through API. In this work, this corresponds to the subway lines of a city. In this example, the selected city is Hamburg, but as we will see later, by following the steps explained in this article and simply adjusting the search_area
parameter, subway lines from any other city or area can be easily extracted.
As you can see in the query, the output format is specified, in this case, JSON format ([out:json]
). The area of interest is the city of Hamburg, which is specified using the parameter area[name="Hamburg"]->.searchArea
. Finally, only subway lines are being retrieved, so it is also specified that we want to obtain relations that are routes of type subway (relation["route"~"subway"](area.searchArea)
). We will see later on that routes for other kinds of transportation types, such as buses, trains or ferries, can also be easily obtained.
Overview of the Overpass API Output Structure
First, we need to analyze the output that we get from the API before extracting its data. It is important to bear in mind that the API output is based on the query that we define and execute, so different queries have different outputs. All the routes are elements of type relation, this is defined on the elements section of the dictionary – in the key type. In OpenStreetMap there are three types of elements:nodes
, ways
, and relations
. I advise you to go through a basic tutorial about OpenStreetMap to get a better view of those elements.

Below is the first element in the list of elements
. As can be seen below, each relation
consists of different nodes
and ways
, which are detailed in the members
section. Information about the route is defined in the tags
key, and this information includes, for example, the starting and ending points of the route and the departure frequency.

The stops are considered nodes
, while the ways
will be the routes that connect these stops with one another and with the platforms. We could create an exact representation of the routes; however, in this analysis, we will make a simplified representation by connecting each stop with a straight line rather than the actual route taken by the path. In more advanced studies, we might consider making an exact representation of the route's layout.
As can be observed, the nodes that make up the route do not specify their coordinates; instead, only a reference ID called ref
is specified. In the elements
list, the coordinates of the nodes
elements are also specified, as shown in the following image. This node is part of the first stop of the above route. As can be seen, consequently we will be able to get the latitude and longitude of the different stops that make up a route for later visualization.

Extracting Route and Stop Locations from the API Response
The next step, after properly understanding the output structure, is to extract all the routes and nodes available in the API response. By checking the information of the nodes, you can see, as mentioned earlier, the location of the stations that form part of a route.
The following two functions demonstrate how, from the API response, you can extract both the routes and the details of the nodes.

The previous image shows the first extracted route. As can be seen, it matches the format of the routes provided by the API. In the case of the nodes, a dictionary has been created where the keys correspond to the node ID.

The two lists created earlier will be the basis for the creation of the graph.
Constructing a Network Graph of Subway Routes
The next step is to create a graph of all the routes obtained in the API response, so that they can later on be visualized using Folium. Before creating the graph, we will check the names of all the subway lines obtained. As can be observed below, the lines are duplicated, with both directions included. Some lines are partial segments of a main line.

In this case, we are going to select all the lines. However, in a more detailed study, we could analyze duplications to proceed with their removal, to avoid having a graph with duplicate lines.
Once the lines are selected, it begins the elaboration of the graph with all metro stops and routes. For that, two functions are created: the first is to create the graph using the NetworkX library and the other function visualizes the graph by using Matplotlib.
The function create_route_graph
is responsible for creating the graph by consecutively adding the stops of each route and connecting them with edges. For each node, the position is added, meaning the latitude and longitude of the stop are included, as well as its name. The color of the node is also added based on information given from the route.
Once the graph is created, it is then visualized. When creating the graph, the latitude and longitude of each node were specified, so it will be possible to visualize them in their exact position.
The visualize_graph
function is used to visualize the graph, but as can be seen below, correct visualization with a static graph is difficult. So we will need to think of another way to properly view the graph. This problem can be solved by creating an interactive visualization using Folium.

Creation of an Interactive Graph Visualization Using Folium
We have previously seen that a static visualization does not allow for an adequate view of the graph, we will proceed to create an interactive visualization with Folium. This type of visualization will allow zooming in to see the metro network in more detail. Additionally, because it is interactive, the station name can be displayed on a hover-over, which will make it easier to observe the network with a small zoom.


Extracting Subway Routes for Other Cities
The advantage of the created code is that it can be used to obtain transportation routes in any city. By simply executing the functions sequentially as shown below, a graph of the metro network of any city can be created. The following code demonstrates this execution to obtain the metro network of Madrid.

As we can see, the metro line of any city can be extracted. By modifying the search area to the region of interest. In the case of Munich, the search area is Oberbayern, as it is specified that way on the platform. For the other cases presented in the article, it is simply the name of the city.



Extracting Multiple Route Types with the Overpass API
The article so far has focused on obtaining subway lines. However, the Overpass API allows extracting all types of routes defined in OpenStreetMap, which includes, for example, bus routes or train routes.
OpenStreetMap has a Wiki where the available elements on the platform can be consulted. In the following link, you can see all the route-type elements available, in addition to the one used in this article. I recommend that you consult the Wiki before designing any query to see all the available elements. For example, in the case of routes, you can even extract ski routes.
For example, we could extract all the ferry routes from Hamburg. Also, we could obtain the bus lines in Hamburg, which would also give us all the intercity bus lines that stop in Hamburg. As you can see, the number of transportation networks that can be extracted is immense, with just a few lines of code. I encourage you to experiment with the code to extract the network of interest for your analysis.


Data Quality of Open Street Maps
Before concluding the article, I would like to offer a brief reflection on the data quality available in OpenStreetMaps. OpenStreetMaps is an open-data website which means that users across the world contribute to the data available on the platform. This implies that the quality depends on the data input by users. Additionally, there are instances of missing data; for example, a metro or bus stop might not have been entered into the platform. This is less common with data that changes infrequently, such as metro stops, but it is much more likely to encounter this problem with data that changes more rapidly, like the opening of businesses. The quality of the data also depends largely on the country, as there are more active users on the platform in some regions of the world, and on the type of area, rural or urban, since, as a general rule, information in rural areas is much more sparse.
It is important to keep this in mind when conducting our analyses to critically evaluate the data obtained and not take Open Street Maps as an absolute truth, as the quality of the data is rarely complete.
The acquisition of geographical data is essential for a large number of analyses in the field of Data Science and predictive model construction. In this article, we have used the Overpass API to access data from the OpenStreetMaps platform, which could later be used for future analyses. The extracted data pertains to the metro routes in the city of Hamburg; however, as explained in the article, the functions used can be applied to extract other types of transport routes in any city. Subsequently, the extracted data has been used to create a graph in NetworkX and finally visualize it using Folium.
I encourage you to try using the Overpass API to extract geographical data from the platform and to conduct analyses different from those presented in this article. You will see that OpenStreetMaps is a very valuable source of information.
Thank you very much for reading.
Amanda Iglesias