Introducing tmap for Visualization and Data Analysis

Author:Murphy  |  View: 20542  |  Time: 2025-03-22 22:43:35

Introduction

Not every Data Scientist will have to decide between Python or R. I constantly see discussions around that matter, some of them annoying, some quite funny. But the truth is that there are and there will be many Data Scientists who can use both languages because they won't need to deploy anything or bring a whole application online. They are just required to analyze some data and create a straightforward executive presentation, in the good old PowerPoint, if you will.

Recently, I needed to do that. I had to get a few data points containing Latitude and Longitude information, analyze those points and come up with a good data storytelling to detail my insights.

I gotta say that I love working with R. I think the language is easy to code and the packages are pretty well built. One of those in my treasure box is the tmap library.

One of the libraries in my treasure box is tmap.

Thematic Map, or just tmap for shorter, is a library built in R to work with spatial data, and the thematic in its name means that we can customize and visualize the data using resources like bubbles, choropleths and layers, based on the grammar of graphics, thus resembling ggplot2, according to the library creators.

In this post, we will go over the basics to get you up to speed building maps with this great tool, giving you a solid base to create visualizations and to keep studying to create more enhanced maps.

Let's import some libraries and get to work. Reminding that any of those that you don't have, just use install.packages("name_of_the_library") to install it in your R Studio.

library(tidyverse) # for data manipulation
library(tmap) # for map visualization
library(sf) # for shapefile manipulation

The Basics

When working with tmap, it is necessary to have data that you can plot on a map. Usually, Latitude and Longitude variables are enough. But certainly, if you have a shapefile with polygons for every region you need to visualize, that helps a lot creating more enhanced views.

The very basics of tmap is that it requires shapefiles objects to be able to plot something. Let's see how to create that.

Shapefile

A shapefile is commonly used for Geospatial Data Analysis. That object is like a data frame, but it can store vectors such as location, geometry and attributes of a point, line or polygon.

To make it more clear, imagine that a shapefile is a dataset that store a column with cartesian locations (points), streets (lines) or a determinate region (polygon) on a map.

Sometimes, you will already have a shapefile from the beginning of your analysis. Take as example the dataset World, present in the Tmap library under the GPL 3.0 license.

# Load the dataset "World"
data("World")

# Checking the data type
> class(World)
[1] "sf"         "data.frame"

That is already a shapefile. If we go further and look in the data, we'll see that it is pretty much like a regular dataset, but it brings an extra column named geometry, with vectors of polygons in the shape of each country, making possible to play with and create visualizations on the map.

Geometry column: Vectors that creates the shape of each country on the map. Image by the author.

Let's see how to use that file in tmap now.

"Hello Plot"

The Hello World of the plots in tmap is simple.

We will start every plot with the function tm_shape(), that serves the purpose of telling the library which shapefile is being plotted (World). Then, we must add (+) the type of shape we want to plot, which in this case are polygons. Thus, we use tm_polygons. And there it is our first plot!

# Plotting the World Map
tm_shape(World) +
  tm_polygons()
First tmap plot. Image by the author.

Now this map is very boring and it does not tell us anything. From there, we need to keep adding more arguments and layers to actually make sense of our data.

Choropleth Maps

Let's start with adding colors. Creating a choropleth map in tmap is fairly easy. All we need to do is to pass the variable of interest as an argument to the tm_polygons.

# Plotting a coropleth of the World Map based on life expectancy
tm_shape(World) +
  tm_polygons(col= 'life_exp')

And the result is next.

Choropleth map of life expectancy per country. Image by the author.

Observe that tmapeven adds the legend by groups automatically. That is customizable, if needed, but it's great that the library already does that for us, as it can save some time during data exploration.

From the map above, now it is easy to see where people are living longer, like Australia, Japan, North America and Western Europe. It is nice to see that America as whole is entirely over 60-ish years now.

Adding other elements

Like previously mentioned, tmap is about layers. It's easy enough to keep adding elements to our plot. Continuing on the World plot, let us add some more elements.

We could add some squares with size varying according to the population. Here is how we can create that view: we simply add the function tm_dots and pass the argument size= 'pop_est'. In the code, we are also adding some opacity with alpha and changing the default circle shape to a square.

# "Hello Plot" with colors and squares
tm_shape(World) +
  tm_polygons(col= 'life_exp') +
  tm_dots(size= 'pop_est', alpha=0.5, shape=15)
Choropleth and dots. Image by the author.

We can play with the elements, like adding a different color palette (palette), using the argument scale to multiply the symbol size by the number. Adding text is equally simple, using tm_text.

# Plot with many elements
tm_shape(World) + #main shape
  tm_polygons(col= 'life_exp') + #polygon for choropleth
  tm_dots(col= 'black', size= 'pop_est', scale=2.5, alpha=0.5, shape=15) + # squares
  tm_text(text= 'iso_a3', size=0.5, just = 'bottom') #text
Choropleth, dots and text. Image by the author.

This graphic is clearly busy and even overwhelming, but that's just to show you the possibilities.

Moving forward with our analysis, we can question what makes some countries to have a higher life expectancy than others? Probably the GDP per capita could be influencing it. Thinking logically, if the person has more money, it is expected that they can afford better medical care.

So let's plot two choropleths side by side and compare life expectancy and gdp per capita. We can do that using the function tm_facets.

# Plot with facet
tm_shape(World) + #main shape
  tm_polygons(col= c('gdp_cap_est', 'life_exp'), 
              style='quantile', palette= 'Blues') + #polygon for choropleth
  tm_facets(nrow=2)

The code will display the following graphic.

Facets with tmap: Easy to look compare life expectancy and GDP percapita. Image by the author.

With that result, it becomes easier to see the similarities between both variables. The same countries with darker blues of gdp per capita are many times also colored with darker shades of blue for life expectancy.

Adding interactivity

So far, we only used the "plot" mode of tmap. But what's even cooler and fancier is that the package offers interactive maps as well. To use that, we simply run this code before plotting any graphic.

tmap_mode("view")

And once we run again the last code for the facets plot – but now just adding the country code, life expectancy and gdp per capita to the popup.vars, here's the interactive result.

# Plot with facet
tm_shape(World) + #main shape
  tm_polygons(col= c('gdp_cap_est', 'life_exp'), 
              style='quantile', palette= 'Blues',
              popup.vars=c(Country= 'iso_a3', Life_Exp= 'life_exp', GDP_capita='gdp_cap_est')) + #polygon for choropleth
  tm_facets(nrow=2)
Tmap Interactive Mode. Image by the author.

This is one of my favorite capabilities of the library. Let's move on.

Analyzing Car Accidents in London

We know that not every dataset will be a shapefile. In fact, most of them won't. So we must have a way to work with datasets that only bring us lat/long variables. That's where the package sf comes handy.

Take this data frame UrbanGB, with urban road accidents coordinates in the Great Britain, available under the Creative Commons license in the UCI repository.

As the data is quite extense, I just filtered it for a tiny portion within London, just for educational purposes.

# Load accidents lat/long and labels datsets
df <- read_csv(path, col_names = c("Longitude", "Latitude"))
labels <- read_csv(path2, col_names = c("labels"))

# Bind both datasets and filter to part of London
df2 <- df %>% 
  bind_cols(labels) %>% 
  filter(Latitude > 51.49 & Latitude < 51.55 &
            Longitude > -0.20 & Longitude < -0.02)
Extract of the dataset. Image by the author.

Once loaded as df2, the class type is a tibble, which is not read by tmap. If we try to plot a basic shape, it won't work.

# error
tm_shape(df2) +
  tm_dots(col= 'label')

Error: Object df2 is neither from class sf, stars, Spatial, Raster, 
nor SpatRaster.

To solve that error, we can simply transform the data into a shapefile. It is simple enough to do that with the function st_as_sf(), where we give the dataset name, the coords are the latitude and longitude coordinates variables, and the argument crs is the code for geospatial coordinate system, being 4326 the most used by GPS.

# We must transform the file to a Shapefile object
sf_df <- st_as_sf(x = df2, 
                  coords = c("Longitude", "Latitude"), 
                  crs = 4326)

Once we run the code, we will have a new variable geometry, which is the vector for a specific lat/long location, and the data class is now shapefile.

Shapefile created from the dataset. Image by the author.

We're ready to plot the points. With the same basic code, we can plot some dots of where the car accidents have happened in London's downtown area.

# plot points
tm_shape(sf_df) +
  tm_dots(alpha=0.2, col='red', 
          border.col='gray90', #border color 
          border.alpha=0.1) #border opacity
Car accidents in downtown London. Image by the author.

Next, we will add a few touristic attractions for reference. We can do that by loading a new dataset attractions, converting it to a shapefile and adding a new shape to the tmap code.

# Load the attractions file
attractions <- read_csv(path3)

# Transform to Shapefile for plotting
attractions_sf <- st_as_sf(x = attractions, 
                  coords = c("longitude", "latitude"), 
                  crs = 4326)

# plot points
tm_shape(sf_df) + # car accidents data
  tm_dots(alpha=0.3, col='red', #dots configs for the car crashes
          border.col='gray90', 
          border.alpha=0.1) +
  tm_shape(attractions_sf) + #attractions data
  tm_symbols(size = 2, col='blue') #config for the attractions points

The following map is what the above code will display. Notice that since we have two shapefiles plotted, we gain the ability to add or remove layers from the visualization.

Tmap with more than on shapefile: layers. Image by the author.

Taking a step further, we can use the function tm_markers() to plot the dots grouped by proximity. It displays a sum of how many points fall into a certain area.

# plot markers
tm_shape(sf_df) +
  tm_markers(size=0.25, alpha=0.75)+
  tm_shape(attractions_sf) + #attractions data
  tm_symbols(size = 2, col='blue') #config for the attractions points

The interactivity is amazing, enabling one to learn the sub-regions where there are more or less car accidents and go up to the street level.

tm_markers( ) in action. Image by the author.

One could find a problematic corner, for example, like the cross of Marylebone Road with Baker Street, where we count 34 crashes.

Car accidents on a street corner. Image by the author.

Buffer Analysis

Furthermore, one last thing to be done is a quick Buffer Analysis. A Buffer Analysis is when we determine a zone around a location that is within a specified distance of that point. That is the buffer zone.

Buffer Analysis is when we determine a zone around a location that is within a specified distance of that point.

This type of analysis can be interesting for us to see how events are happening around our point of interest. In this case, how many car accidents happen around touristic points in London.

The code here takes only the lat/long from attractions, then we create a SpatialPoints object, followed by a transformation to Euclidean points (spTransform) to make the buffer function to work. Then we create the buffer with gBuffer, using a radius of 500 meters around the attractions.

# To use buffer analysis, we need just lat/long info in a Shapefile
attr_latlong <- attractions %>% select(-1)

# Convert to Shapefile
attr <- SpatialPoints(coords = attr_latlong,
                          proj4string = CRS("+proj=longlat"))

# Convert distances to euclidean, for gBuffer function to work properly
attr_UTM <- spTransform(x = attr,
                            CRSobj = CRS("+init=epsg:22523"))

# Buffer Analysis: distance from a point
buffer_attractions <- gBuffer(spgeom = attr_UTM,
                              width = 500,
                              byid = TRUE)

We can use the object buffer_attractions in a shape to plot this into the graphic.

# Plot graphic + buffer
tm_shape(buffer_attractions) + # buffer
  tm_borders(col = "darkred") + #buffer border
  tm_fill(col = "blue",  alpha = 0.1)+ #buffer fill
  tm_shape(attractions_sf) + # attractions  
  tm_dots(size=0.5, col='blue')+ # attractions points
  tm_shape(sf_df) + # car accidents data
  tm_dots(alpha=0.3, col='red', #dots configs for the car crashes
          border.col='gray90', 
          border.alpha=0.1)

As the result, this is the plot.

Car crashes + Touristic Attractions + Buffer. Image by the author.

That's an informative view. BT Tower, the Westminster Abbey and the Houses of Parliament are where we notice more accidents overlapping the buffer zones.

Finally, let's see how to export the maps, once our analysis is done.

Saving Maps

To save maps, the code is pretty straightforward. We can use the tmap_save() function and the extension used for the file name will determine if the map is interactive or not.

# Saving Maps
to_save = tm_shape(sf_df) +
  tm_markers(size=0.25, alpha=0.75)+
  tm_shape(attractions_sf) + #attractions data
  tm_symbols(size = 2, col='blue') #config for the attractions points

# save an image
tmap_save(to_save, filename = "My_Map.png")

# save as interactive HTML file
tmap_save(to_save, filename = "My_Map.html")
Files Saved locally. Image by the author.

Before You Go

This is all for a brief introduction to tmap, a great mapping library from R Language.

The library can be very helpful in a number of analysis involving geospatial data, as it provides powerful functions that are easy to use. Basically, all we need is a shapefile. The rest is just building the layers as needed.

As a summary:

  • The maps need a shapefile to work
  • The basic plot is made of a shapefile plus a geometry, for example tm_shape() + tm_dots()
  • Polygons are regions in a map
  • Lines are rivers or streets
  • Points are locations (lat/long)
  • tmap_mode('plot') is for simple plots, static.
  • tmap_mode('view') is for interactive plots.

Here is the code in GitHub:

Studying/R/TMAP at master · gurezende/Studying

If you liked this content, follow me for more and subscribe to my newsletter.

Gustavo Santos – Medium

Also, find me on LinkedIn, here.

References

UCI Machine Learning Repository

RPubs

tmap: vignettes/tmap-getstarted.Rmd

Shapefile

Tags: Data Science Geospatial Rstats Rstudio Tmap

Comment