Efficient Large Dimensional Self-Organising Maps with PyTorch

Author:Murphy | View: 29335 | Time: 2025-03-22 19:16:20

Self-organising maps (or Kohonen maps) are an interesting kind of neural networks: they don't follow the same kind of architecture and are definitely trained differently from the usual backpropagation methods. There is a good reason for this: they are meant to be used for unsupervised learning. They are to the usual multi-layer neural networks what K-Means is to SVM. They create clusters; they discretise the data space. But they have one thing that makes them different from other clustering methods: The clusters that they create form a map of the data (a grid of clusters) where the distance between clusters in that map represents the distance that exists between the average members of those clusters in the data space.

Because they are slightly atypical, there has not been as much work done on creating efficient implementations of self-organising maps (SOMs) as for other forms of Neural Networks, in particular with respect to enabling them to handle highly dimensional data on GPUs (i.e., they are typically used on data with not more than a few dozen features). Too bad, since that is exactly what I needed for a project: fast SOM training on data with thousands of features. I had tried existing libraries, including those based on PyTorch, and was not quite satisfied, so I made my own: ksom (admittedly also because it is fun to do, especially as a way to get better at using PyTorch).

To illustrate how it works, let's consider a simple example (which is very far from having high dimensionality, but it is easy to understand): The clustering of pixels of an image by colours (already used to illustrate the K-Means algorithm in a previous article).

Let's start from the end! Below is a picture used as an example (my dog Chica), and the resulting 6×6 map of the colours of pixels in that image. You can find the code to train this map on the GitHub repo for ksom.

A photo of Chica, and the SOM of the colours on this photo trained from the pixels it contains (image by the author).

The training of the map basically consists in showing it, one after the other, all of the data points (pixels of the image) possibly over several iterations (here, we only use one epoch, since there are enough pixels in the image to get a good result). Before starting, the map is initialised (randomly, with zeros or with randomly selected data points from the dataset). In our case, it is a 6x6x3 tensor representing the square map (we could have rectangular ones, and some libraries allow for other shapes or even 3D maps) with each unit (cell, cluster) represented by a vector of the same dimension as data points (so here, the red, green and blue components of the colour of a pixel).

For each data point during training, the map is updated by first identifying the unit closer to the data point according to a given distance measure (the BMU, best matching unit). This unit is then updated to get closer to the data point, and other units in the neighbourhood are also updated, to a lesser extent, depending on a given neighbourhood function (Gaussian, linear…) at a given radius (decreasing as training goes on).

So, in our example, having loaded the data in a tensor x, the code to initialise and train the SOM looks like this:

from ksom import SOM, cosine_distance, nb_gaussian
...
smodel = SOM(6, 6, 3, # size of the map and dimension of units
             sample_init=samples, # initialised with samples
             dist=cosine_distance, # using cosine distance for BMU
             alpha_init=0.01, # learning rate
             alpha_drate=1e-7, # decay of learning rate
             neighborhood_fct=nb_gaussian, # neighboorhood function
             neighborhood_init=som_size, # initial neighbourhood radius
             neighborhood_drate=0.0001) # decay of neighbourhood radius

perm = torch.randperm(x.size(0)) # to shuffle the data
for i in range(int(x.size()[0]/1000)):
    idx = perm[i*1000:(i+1)*1000] 
    time1 = time.time()
    dist,count = smodel.add(x[idx]) # feed the SOM a batch of 1000 pixels
    print(f"{(i+1):06d}K - {dist:.4f} - {(time.time()-time1)*1000:05.2f}ms")

The data points are provided by batch (even though they are then treated independently). As the training progresses, the vector for each unit (its weights) become close to the average of all the data points for which it is BMU, and the units with similar vectors tend to endup being close to each other on the map. How well this works is of course dependent on all the parameters: the neighboorhood function, size of the map, neighboorhood radius, distance metric, learning rate, etc. and all of those are dependent on each other. In other words, it takes a bit of trial and error to make it work well.

Evolution of the pixel/colour SOM during training (image by the author).

Here is another example, taking a dataset of cheeses mostly described by binary attributes, using the following initialisation of the SOM and training it over 7epochs (see the full code):

smodel = ksom.SOM(6, 6, len(df.T), 
                  zero_init=True,
                  dist=ksom.cosine_distance)

for epoch in range(7):
    for b in range(math.ceil(len(df)/BATCHSIZE)):
        dist,count = smodel.add(torch.Tensor(df.iloc[b*BATCHSIZE:(b+1)*BATCHSIZE].to_numpy()))
        print(f"{epoch+1:02d}.{b:02d}: distance {dist:.4f} out of {count} objects")
        freqmap = torch.zeros(SOMSIZE*SOMSIZE)
        bmu,dists = smodel(torch.Tensor(df.to_numpy()))
        for i in bmu: freqmap[i[0]*SOMSIZE+i[1]] += 1

Evolution of the SOM for cheeses. On the left, weights of units represented by colours (applying PCA to 3 dimensions and interpreting the components as the red, green and blue values of the colours). On the right, map of the population of units (number of cheeses for which a unit is BMU) and most distinctive attribute of each unit (image by the author).

Now that's all good, but I started by saying that I wanted something that would scale well over highly dimensional data. Here is a comparison training a 10×10 map on random data (see the code) with a few other libraries, some working on CPU (ran on an i5–1335U) and some working on GPU (ran on an RTX A6000). In short, there are better alternatives than ksom at low dimensions, but as the size of data points/weight vectors increases, ksom on CPU becomes significantly faster, and ksom on GPU simply stays very fast (it would take a bigger map and more dimension to see it going up).

Comparison of the training time of a 10×10 SOM at different dimensions using ksom, quicksom, minisom or sklearn_som (image by the author).

Of course, ksom remains a small project. Not everything is tested and there are probably many bugs that did not show up yet in my own use of it (let me know if you find some), but as a way to create large self-organising maps in python, it seems to be doing the trick.

Tags: Artificial Intelligence Neural Networks Pytorch Self Organizing Map Unsupervised Learning