Introduction to spatial analysis of cells for neuroscientists (part 1)

Author:Murphy  |  View: 28387  |  Time: 2025-03-22 21:31:34
Density kernel generated from point patterns

As a neuroscientist, in recent years I have been interested in developing strategies that allow multimodal assessment of cell distribution in the brain. My motivation was to quantitatively understand the cellular rearrangement of neuroglia after brain injury. Along the way, I came across spatstat(1), a multifunctional R package for spatial analysis based on point patterns, called point pattern analysis (PPA). This approach is well developed in fields such as geography, epidemiology, or ecology, but applications to neurobiology are very limited, if not non-existent. I recently published a short protocol (2), and the reader can find a preprint (3) with a much longer and dedicated application of this approach.

In this post, my goal is to provide an accessible introduction to the use of this method for researchers interested in unraveling the spatial distribution of cells in different tissues, without the narrative rigidity of scientific papers.

What is point pattern analysis (PPA)?

PPA is a spatial analysis technique used to study the distribution of individual events or objects in a given area (also called an observation window). This method allows researchers to examine the number of objects per unit area (called spatial intensity), whether the points are randomly distributed, clustered, or regularly spaced, and the variations in spatial intensity conditional on different covariants. Unlike raw and non-reproducible cell counts (e.g., 100 cells/mm2), PPA preserves all spatial information and allows multiple and reproducible manipulations of the point patterns. This allows researchers to identify underlying processes or structures that influence the distribution of objects of interest.

Requirements for PPA

The only requirement to perform PPA is to have xy coordinates of single objects (cells, proteins, subcellular structures, etc.). In this article, we focus on 2D PPA, although 3D approaches are also available. These coordinates are then processed using R and the spatstat function to create point patterns and store them as hyperframes.

I obtained the coordinates of individual cells using unbiased cell detection/quantification approaches using QuPath (4) or CellProfiler (5). I find that the detection and segmentation of round/circular objects like neurons (e.g. NeuN) is easier compared to irregular objects like astrocytes (GFAP) or microglia (IBA1), especially when cell density is high and there is a lot of cell overlap (e.g. glial aggregation after brain injury). The segmentation of irregular, highly clustered objects is still a frontier in this field. However, the QuPath or CellProfiler provide reasonable accuracy and, most importantly, are reproducible and can be validated. A human observer could not do better. Therefore, I recommend not to worry too much if in some cases you get the impression that certain objects only correspond to fragments of a cell or a combination of several cells. Fine-tune the parameters to ensure that the cell detection/segmentation does the best job possible. If the cells are far enough apart (e.g. healthy brain, cell culture), there is not much to worry about.

Creating point patterns

When working with multiple samples, the creation of point patterns can be simplified by using functions like the following link. The core of this procedure is to convert individual .csv files containing single-cell coordinates into point patterns (using the ppp function of spatstat) and organize them into a hyperframe that can be saved and shared as a .rds R object.

Here, we'll load a point pattern I have created during my research (3). This file is available in the GitHub repository under the name PointPatterns_5x.rds. Please feel free to use it for research, education, or training purposes.

library(brms)
library(dplyr)
library(ggplot2)
library(gtsummary)
library(modelr)
library(spatstat)
library(tidybayes)
PointPatterns <- readRDS("Data/PointPatterns_5x.rds")
row.names(PointPatterns) <- PointPatterns$ID 

head(PointPatterns)

You can see that the hyper-frame contains several columns of variables. Let's focus on the first three columns, which contain point patterns for three types of brain cells: Neurons, Astrocytes, and Microglia. We will rewrite some variable columns in our own way to exercise the implementation of PPA. First, let's take a look at what the point patterns look like by plotting them all at once (for neurons):

plot(PointPatterns$Neurons)
Figure 1: Multiple point patterns.

Let's see the details by looking at any single brain:

plot(PointPatterns$Neurons$M05)
Figure 2: Single point pattern.

We can play a little bit with the plots, by displaying two cell types (point patterns) at the same type and changing the way (shape and color) they are plotted. Here is an example:

# We plot neurons in black with symbol (10)
plot(PointPatterns$Neurons$M05, pch = 10, cex = 0.4, main = "Neurons and Astrocytes")

# We add astrocytes in red with a different symbol (18)
plot(PointPatterns$Astrocytes$M05, pch = 18, cex = 0.4, col = "red", add = TRUE)
Figure 3: Different cell types and aesthetics.

This gives a first impression of the number and distribution of cells, but of course we need to quantify it. A first approach is to obtain the estimated spatial intensity for each point pattern. We can generate an extra column for each row in the hyperframe with a simple code. For the sake of this post, we will do this for astrocytes only:

PointPatterns$AstrocytesIntensity <- with(PointPatterns, summary(Astrocytes)$intensity)

head(PointPatterns[,9:11])

You can see that we have created a new column that contains the spatial intensity of astrocytes. Next, we extract the information into a data frame along with the grouping variables:

Astrocytes_df <- as.data.frame(PointPatterns[,9:11])

# We make sure to organize our factor variable in the right order
Astrocytes_df$DPI <- factor(Astrocytes_df$DPI, levels = c("0D", "5D", "15D", "30D"))

gt::gt(Astrocytes_df[1:10,])

This's a good start, you are able to get the number of cells per unit area in a reproducible way using unbiased/automatic cell counting. Let's make a simple scientific inference from this data.

Fit a statistical model for the spatial intensity

As usual in my blog posts, we use brms (6) to fit a Bayesian linear model where we investigate the Astrocyte spatial intensity conditioning on DPI, that is, the days post-ischemia (brain injury) for the animals in this data set. We're going to build a model with heteroscedasticity (predicting sigma) because (I know) the variance between DPIs is not equal. It is much smaller for 0D.

Astrocytes_Mdl <- bf(AstrocytesIntensity ~ DPI, 
                     sigma ~ DPI)

Astrocytes_Fit <- brm(formula = Astrocytes_Mdl,
                      family = student,
                      data = Astrocytes_df, 
                      # seed for reproducibility purposes
                      seed = 8807,
                      control = list(adapt_delta = 0.99),
                      # this is to save the model in my laptop
                      file    = "Models/2024-05-24_PPA/Astrocytes_Fit.rds",
                      file_refit = "never")

# Add loo for model comparison
Astrocytes_Fit <- 
  add_criterion(Astrocytes_Fit, c("loo", "waic", "bayes_R2"))

Let's look at the summary table for our model:

We see that animals at 0D (intercept) have a mean spatial intensity of 4.3 with a 95% credible interval (CI) between 0.73 and 2.90. That's a very small number of cells. On the other hand, we have a peak at 15D with a mean of 26.9 with CIs between 22 and 31.

Let's plot the results using the great TidyBayes package (7) from the great Matthew Kay

Astrocytes_df %>%
  data_grid(Astrocytes_df) %>%
  add_epred_draws(Astrocytes_Fit) %>%
  ggplot(aes(x = .epred, y = DPI)) +
  labs(x = "Spatial intensity") +
  stat_halfeye() +
  geom_vline(xintercept = 0) +
  Plot_theme
Figure 4: Posterior distribution for the spatial intensity of astrocytes.

stat_halfeye()from Figure 4 is a nice way to look at the results. This procedure is analogous to counting cells in a given area. The advantage of PPA is that you do not have to rely on the supposed visual acuity of a student counting cells (the supposed experts are not the ones counting them), but you can produce unbiased cell counts that can be validated and are reproducible and reusable. Clearly, dear reader, we can do much more with PPA.

Creating density kernels

We have density kernels available in the loaded point patterns, but we'll rewrite them for this post. A density kernel is a method of estimating the probability density function of a variable, in this case, the location of cells. This provides a smooth estimate of the intensity function that produced the observed data.

Kernel density estimation for point patterns can be formulated as follows:

We'll recreate the density kernels for astrocytes and microglia using the density function from spatstat. Please make sure that this function is not overwritten by other packages. I find that a sigma (bandwidth) of 0.2 gives a fair readout for the point pattern density.

PointPatterns$Astrocytes_Dens <- with(PointPatterns, density(Astrocytes, sigma = 0.2, col = topo.colors))

With this ready, I want to give you an example of the impact of sigma in the density kernel using a single brain:

par(mfrow = c(1,3), mar=c(1,1,1,1), oma=c(1,1,1,1))

plot(density(PointPatterns$Astrocytes$M05, sigma = 0.02), col = topo.colors, main = "sigma = 0.02")
plot(density(PointPatterns$Astrocytes$M05, sigma = 0.2), col = topo.colors, main = "sigma = 0.2")
plot(density(PointPatterns$Astrocytes$M05, sigma = 2), col = topo.colors, main = "sigma = 2")
Figure 5: Density kernels with different sigma.

Figure 5 shows that, in the first case, we see that a very low sigma maps single points. For sigma = 0.2, we see a mapping on a larger scale and we can distinguish much better regions with low and high density of astrocytes. Finally, sigma = 2 offers a perspective where we cannot really distinguish with precision the different densities of astrocytes. For this case, sigma = 0.2 is a good compromise.

Now we'll fit a simple point process model to investigate the relative distribution of neurons conditioning on astrocyte density (mapped by the density kernel).

Fit a point process model (ppm)

Here, we use the mppm function from spatstat to fit a multiple-point process model for the point patterns in our hyperframe.Unfortunately, there are no Bayesian-like functions for multiple-point patterns in spatstat.

# We fit the mppm model
Neurons_ppm <- mppm(Neurons ~ Astrocytes_Dens, data = PointPatterns)

# We check the results
summary(Neurons_ppm)

Remember that spatial models are fitted with a Poisson distribution that uses the log link function to obtain only positive results. This means that we need to exponentiate the results in the table to convert them to the original scale. Therefore, we can see that the spatial intensity of neurons at a baseline (when the density of astrocytes is 0) is exp(3.54) = 34.4. This intensity decreases by ex(-0.002171358)=-0.99 for every unit increase in astrocyte spatial intensity (as defined by the density kernels). In other words, this model tells us that we have fewer neurons at points where we have more astrocytes. Note that we do not include DPI in the regression, an exercise you can do to see if this estimate changes with DPI.

There are more aspects to explore for PPA. However, not to make this post long and heavy, I will cover them in the next two posts. Here you could learn how to calculate and extract the spatial intensity of cells, create density kernels, and build point process models with them. In the next post, we'll explore how to perform calculations for relative distributions and how to use raster layers to further explore the cell distribution.

I would appreciate your comments or feedback letting me know if this journey was useful to you. If you want more quality content on Data Science and other topics, you might consider becoming a medium member.

You can find a complete/updated version of this post on my GitHub site.

  • All images, unless otherwise stated, were generated by the authors using R code.

References

1.A. Baddeley, E. Rubak, R. Turner, Spatial point patterns: Methodology and applications with R (Chapman; Hall/CRC Press, London, 2015; https://www.routledge.com/Spatial-Point-Patterns-Methodology-and-Applications-with-R/Baddeley-Rubak-Turner/p/book/9781482210200/).

  1. D. Manrique-Castano, A. ElAli, Unbiased quantification of the spatial distribution of murine cells using point pattern analysis. STAR Protocols. 5, 102989 (2024).
  2. D. Manrique-Castano, D. Bhaskar, A. ElAli, Dissecting glial scar formation by spatial point pattern and topological data analysis (2023), (available at http://dx.doi.org/10.1101/2023.10.04.560910).
  3. P. Bankhead, M. B. Loughrey, J. A. Fernández, Y. Dombrowski, D. G. McArt, P. D. Dunne, S. McQuaid, R. T. Gray, L. J. Murray, H. G. Coleman, J. A. James, M. Salto-Tellez, P. W. Hamilton, QuPath: Open source software for digital pathology image analysis. Scientific Reports. 7 (2017), doi:10.1038/s41598–017–17204–5.
  4. D. R. Stirling, M. J. Swain-Bowden, A. M. Lucas, A. E. Carpenter, B. A. Cimini, A. Goodman, CellProfiler 4: improvements in speed, utility and usability. BMC Bioinformatics. 22 (2021), doi:10.1186/s12859–021–04344–9.
  5. P.-C. Bürkner, Brms: An r package for bayesian multilevel models using stan. 80 (2017), doi:10.18637/jss.v080.i01.
  6. M. Kay, tidybayes: Tidy data and geoms for Bayesian models (2023; http://mjskay.github.io/tidybayes/).

Tags: Data Science Hands On Tutorials R Programming Spatial Data Analysis Statistics

Comment