An Open Data-Driven Approach to Optimising Healthcare Facility Locations Using Python

Author:Murphy  |  View: 29078  |  Time: 2025-03-22 21:17:23
Image generated by Authors in Midjourney

This work was co-authored with Prof Joaquim Gromicho and Kai Kaiser. The author(s) are responsible for all errors and omissions.


⁤According to research published in 2020 on global maps of travel time to healthcare facilities, 43.3% (3.16 billion people) cannot reach a healthcare facility on foot within one hour. ⁤⁤

Accurately calculating travel time to healthcare facilities is fundamental in assessing healthcare accessibility, particularly in regions where barriers to access can significantly impact public health outcomes. ⁤⁤These calculations are vital for resource allocation, healthcare utilization, equitable healthcare access, and strategic planning for future facilities. ⁤⁤However, to calculate this, a lot of data crunching is needed, including the location of hospitals, population distribution, and travel time calculations based on road network data such as OpenStreetMaps or APIs such as Google or Mapbox. ⁤⁤

Geographic variability, such as differing terrains, road conditions, and weather, also contributes to the calculation of travel times. ⁤⁤The availability and type of transportation also restrict access to health facilities, with many rural areas lacking reliable public or personal transport options. ⁤⁤Furthermore, the accuracy and availability of geocoded data of all the hospitals are often not available, particularly in developing countries, leading to less precise estimations of access.

This blog leverages the power of open-source data and tools to address the challenge of calculating physical access in countries and administrative regions, especially where population censuses are infrequent and road network and health facility data are not regularly updated.

This is demonstrated in Timor-Leste, where healthcare access is hindered by systemic inefficiencies and financial burdens, making the need for up-to-date information even more critical.

Methodology and Data Used

We first download the shapefile of our chosen region of interest (one of the municipalities of Timor Leste called Baucau) from Humanitarian Data Exchange (HDX). HDX provides access to the Global Database of Political Administrative Boundaries Database, a standardized resource for countries' boundaries with global coverage. The data is available under the Open Database License (ODC-ODbL).

We stack this with high-resolution population data from Meta. This data is licensed under Creative Commons Attribution International.

To determine how to improve accessibility, we need to start with the existing healthcare facility locations (hospitals and clinics). An open-source repository for this data is OpenStreetMap. This is a good place to start, but it may not be as comprehensive as official health facility registers maintained by governments or international development organizations such as the World Health Organization.

Next, we use the Openrouteservice API and MapBox Isochrones API to calculate travel times and assess healthcare accessibility. API results obtained from openrouteservice in any context are licensed under CC-BY 4.0. For more details on the terms of use of the Isochrone API, see Mapbox Product Terms.

Using the above information and analysing catchment areas of existing facilities, we can create a detailed visualization of healthcare coverage as an interactive map, identifying population with and without access.

Finally, we run an optimization model to identify potential locations for new healthcare facilities.

Extracting Timor-Leste's Administrative Boundaries from Humanitarian Data Exchange and visualizing using Folium in Python

The following code snippet initializes a downloader for GADM (Global Administrative Areas) data, specifying version 4.0. It then retrieves the administrative boundary data for Timor-Leste, focusing on the first administrative level, such as districts or provinces. After obtaining the Geospatial data, the script visualizes these boundaries on a Folium map, using OpenStreetMap as the base map.

Timor Leste's Administrative Boundaries (Level 1) extracted from HDX (Image by Authors)
Baucau – Administrative Boundary selected for analysis (Image by Authors)

Downloading and visualizing High-Resolution Population Density Maps from Meta

This section discusses downloading and visualizing high-resolution population density maps from Meta. These maps provide population estimates at a 30-meter resolution, with demographic breakdowns covering various groups, and are publicly available for over 160 countries. The maps are created by modelling population growth using census data, analyzing satellite imagery to detect buildings, calculating building density, and distributing population data across tiles based on this density.

In the following Python code snippet, a function fb_pop_data is defined to retrieve 2020 Facebook population data for Timor-Leste (identified by the ISO code ‘TLS'). The data is downloaded, processed into a GeoDataFrame, and aligned with the selected administrative boundary's coordinate reference system. The population within the area of interest (the selected administrative region) is then calculated and displayed.

This returns a population of 126,603 in Baucau.

Mapping Healthcare Facilities in Timor-Leste Using OpenStreetMap Data

This code segment is dedicated to retrieving and analyzing data on Healthcare facilities in Timor-Leste, specifically hospitals and clinics. It performs the following functions:

  • Queries OpenStreetMap via the Overpass API for hospital and clinic locations across Timor-Leste.
  • Extracts essential data such as names, coordinates, and amenities and stores it in separate DataFrames for hospitals (df_hospitals) and clinics (df_clinics).

Merges these datasets into a single GeoDataFrame (df_health_osm) for spatial analysis.

  • Executes a spatial join to determine the count of healthcare facilities within a specified area of interest (AOI), providing valuable insights for assessing healthcare accessibility in Timor-Leste.

This returns a total of 14 hospitals and clinics in Baucau.

Assessing Healthcare Accessibility with Isochrone Analysis

The function, get_isochrone_osm, calculates isochrones for healthcare facilities in Timor-Leste. An isochrone is a polygon representing areas reachable within a specified time from a location, typically a hospital. The function uses the OpenRouteService API to generate these polygons based on a 60-minute travel time with the mode as walking. The resultant polygons are then used to determine the population with access to each facility.

Population with Access = 46.07%

A crucial aspect of assessing healthcare accessibility is understanding the reliability and sensitivity of our analysis based on the underlying data sources. The road network data from OpenStreetMap (OSM) is used in calculating travel times to healthcare facilities in the previous step. However, this data can vary significantly in quality and completeness, which may affect the accuracy of our isochrone maps and, subsequently, our conclusions about accessibility. So, it is advisable to conduct sensitivity analysis using alternative data sources to address these uncertainties.

For regions where OSM data might be less reliable or outdated, utilizing APIs like Mapbox or Google can provide a more accurate calculation of accessibility based on possibly more up-to-date or complete road network information. If resources allow, these APIs can complement or validate our findings from OSM data.

To illustrate, let's integrate Mapbox to perform an isochrone analysis. This example shows how to calculate the area accessible within a 60-minute walk from existing hospitals using the Isochrone API by Mapbox.

Population with Access = 47.4%

The code below allows us to visualize the above results. It outlines administrative boundaries with orange GeoJson objects and marks hospital locations with blue pins, each featuring a popup with the hospital's name. The map also differentiates populations based on their access to healthcare, using red circle markers for those without access and green for those with access. The opacity of these markers is determined by population counts, enabling a clear visual distinction between densely populated and well-served areas versus those that are sparsely populated and underserved, thus illustrating disparities in healthcare accessibility.

Left: Openrouteservice Analysis (46.07% Coverage) and Right: Mapbox Analysis (47.4% Coverage) – This map compares healthcare facility catchment areas in Baucau, Timor-Leste, using two different routing services – Openrouteservice and Mapbox. (Image by Authors)

Representational Potential Locations Grid of the Administrative Boundary

Optimising the placement of new hospitals and clinics is essential to ensure comprehensive healthcare coverage. This often involves analyzing potential locations which, in the absence of specific site recommendations from official sources, can be approximated through a representational grid within the target area.

Such a grid provides a starting framework for considering where to establish new healthcare facilities to maximize accessibility for the underserved population. The code snippet outlines a Python function generate_grid_in_polygon designed to create this representational grid within a given geographic area, represented as an MultiPolygon object.

The function generates a series of evenly spaced points—determined by the spacing parameter—across the extent of the provided geometry, resulting in 318 potential locations in Baucau.

Potential locations as a grid of 0.02 degrees (Image by Authors)

To optimize the placement of potential healthcare facilities, our analysis involves calculating isochrones for each proposed location, utilizing the same 60-minute walking parameter previously applied to existing facilities. Utilizing this data, we decide which population segments lack access and can be served by these potential facilities.

We then aggregate access data from existing and potential locations to calculate the maximum possible access if all the potential locations are opened along with existing facilities.

Maximum access attainable with this potential location list = 86.63 %

Mathematical Optimization

We will now employ Mathematical Optimization to determine the optimal subset of hospitals to open. For those unfamiliar with mathematical optimization, we recommend starting with a hands-on introduction, which is available in this Jupyter book.

At its core, Mathematical Optimization involves creating mathematical models that act as digital twins of the real-world scenarios we aim to optimize. After developing those models, we input relevant data to create specific instances of an optimization problem. These instances are then solved using an appropriate solver to discover the best feasible solutions, which we call the optimal solution.

Modelling is a conceptual process, while coding the models is a technical craft. We use the package pyomo for the latter as the Jupyter book does. Please note that the model discussed subsequently is featured as Exercise 3.1 in a forthcoming textbook from Cambridge University Press, the aforementioned Jupyter book being its online companion.

A mathematical optimization model can be seen as a blueprint for the intended optimal solution. Once instantiated, the model is processed by a solver that seeks the optimal solution if one exists.

We can utilize powerful solvers to solve instances. For problems like the one we will describe, Gurobi is an outstanding commercial solver, while HiGHS, an excellent open-source alternative, is available under the MIT license.

The modelling process begins by identifying the decisions to be made, leading to defining decision variables. After naming these decisions and variables, we formalize the objective and constraints using the functions of those variables.

  • The objective function measures the quality of a solution.
  • Constraints ensure that a solution adheres to all necessary rules to be considered feasible.

In our case, and many others, the functions will be linear, and our variables will have a binary nature to represent yes/no decisions.

For example, we need one variable per household to determine if an accessible open hospital serves it and another variable per hospital to indicate whether it is open. Typical mathematical notation for expressing models starts by naming the sets supporting the variables' indices and the model parameters derived from the data.

For our optimization challenge, these include:

Sets

Comment