Mapping the Pokemon World: A Network Analysis of Habitat-Based Encounters

Author:Murphy  |  View: 26885  |  Time: 2025-03-22 21:14:57
Photo by Michael Rivera on Unsplash

Introduction

Network Analysis is a robust yet straightforward approach to meaningfully understanding various entities' relationships, connections, and groupings. From a social media perspective, one could use network analysis to understand which profiles have the most meaningful connections; an e-commerce company could utilize its web analytics to understand the browsing relationships between products; a company could analyze email correspondence to understand which teams frequently work together, and so much more! Network Analysis can undoubtedly be applied if the data you are working with resembles some network.

This article will introduce Network Analysis via Python, practical insights you can gather from a Network Analysis, and a fun project with the hit TV and game series Pokemon!

What is Network Analysis

image generated by DALL-E

Note that there are several definitions of this topic; however, the Health Psychology and Behavioral Medicine journal has an article that features a definition that sums up the perspective of how we will perform a Network Analysis (NA) in this article:

Networks comprise graphical representations of the relationships (edges) between variables (nodes). Network analysis provides the capacity to estimate complex patterns of relationships and the network structure can be analysed to reveal core features of the network. [1]

In other words, NA is a mathematically powered visualization approach that allows one to understand complex relationships among observations in a dataset. The visualization itself is a set of nodes that are interconnected by lines. The nodes can represent people, places, animals, products, etc. The lines connecting them represent some relationship that exists between two nodes. These are also known as edges. In addition to the visualization, we can use various algorithms and metrics to understand which nodes are the most centralized, which edges create the most connections between nodes, and isolate subnetworks (or communities) within our main network.

Getting Started in Python

To begin with, we will import the Networkx library, create a Networkx graph object, and create a network graph with made-up data. Be sure to install any libraries mentioned in this article if you are following along and have not already installed them. Take a look at the image below:

import networkx as nx

G = nx.Graph()
G.add_edges_from([('A', 'B'), ('A', 'C'),('A','D'),('C','E'),
                  ('C','F'),('B','H'),('H','I'),('D','J'),
                  ('A','K'),('D','K'),('B','I'),('J','A'),
                  ('F','E'),('G','C'),('G','E')])

nx.draw(G, with_labels=True,node_color='lightblue', font_color='black', font_weight='bold')
Image provided by the author

What do you notice about this graph? Let's start with the basics. The letters inside of the blue circles represent the nodes in our data. The lines that are interconnecting are the edges. From a more practical perspective, let's pretend this data represented a social media site. The nodes could represent individual profiles, while the edges could represent the two profiles connected by the edge as friends on social media.

Now that we have the basics down, let's take this further. What nodes or edges are the most important? Is there a way to quantify the importance of a node or an edge? Looking at this graph, can we easily point out any subnetworks or communities? Let's dive into these concepts below.

Closeness Centrality

How can we determine a node's importance? One way is to use a measure known as closeness centrality. The NetworkX documentation defines the closeness centrality of a node (u) as:

the reciprocal of the average shortest path distance to u over all n-1 reachable nodes. [2]

Image from NetworkX documentation from the closeness_centrality method section

In other words, the node with the highest closeness centrality score is the one you would pass through the most if you had to traverse from any given node to the other on average.

Think of this from a social media perspective. Let's say you are an influencer attempting to gain as many followers as possible. You want to make as many connections as possible, and one way of doing so could be making connections with other people with a high degree of centralness among a specific niche of profiles that would be meaningful connections for you. Aiming for these central connections would maximize your profile's visibility, as you would be more likely to be recommended as a friend or connection to a wide range of profiles.

Looking back to our mock network above, which node has the highest closeness centrality score? I argue that node A has the highest closeness centrality score as it appears to be in the center of all the other nodes. Let's use the _closenesscentrality method in Python to retrieve the score below:

nx.closeness_centrality(G)
Image provided by the author

Edge Betweenness Centrality

Are some of the connections or edges between nodes more relevant than others? Looking back at our graph, do any of the edges stand out? The edges for A ~ B and A ~ C stand out as they connect specific clusters of nodes. Returning to our original point about quantifying importance, we can do this with edges with a measure known as the edge betweenness centrality. NetworkX defines it below:

Betweenness centrality of an edge

Tags: Data Science Data Visualization Hands On Tutorials Pokemon Python

Comment