How to Color Polars DataFrame

Author:Murphy | View: 21916 | Time: 2025-03-23 11:39:20

AI image generated by ChatGPT. Prompt: A polar bear painting in a snowy landscape.

Since the Polars library was released in 2022, it has rapidly gained popularity due to being an ultra-speed DataFrame library. Compared with Pandas, the white bear was tested and proved to be way faster. According to the official Polars website, it is claimed to reach more than 30x performance gains.

However, nothing is perfect. The Polars library seems to have some limits.

When it comes to styling tables, Polars offers fewer options while Pandas has a built-in styler available. If you want to color a Polars DataFrame, a straightforward solution is converting the table into Pandas.

But wait… what if some code needed to be run later?

Examples show Polars table before and after styling in this article. Image by author.

This means we have to run Pandas which can result in a drastically slower speed. Another choice is converting the table back to Polars after styling. Then, if we want to style the result, the same process has to be repeated. Even though these solutions work, they are quite inconvenient.

Fortunately, there is a package called ‘Great Tables‘ which can be applied directly to a Polars table. This package allows us to create a nice-looking table while working with the Polars library.

This article will guide a step-by-step to style Polars tables using the Great Tables package.

Let's get started!!

Importing libraries

Start with getting libraries that we are going to use. The Great Tables package is used under the MIT license.

import numpy as np
import polars as pl
import polars.selectors as cs
import re
import wikipedia
import pandas as pd

from great_tables import GT
from great_tables import style, loc

Getting Data

To show that the method explained in this article can be applied to real-world data, I will use ‘Wind power by country‘ data from Wikipedia.

Firstly, let's use the Wikipedia library to retrieve the HTML data. Then, we will use Pandas to read the data before turning it into a Polars DataFrame. The data from Wikipedia are used under the CC BY-SA 4.0 International license.

If you want to try another dataset, this step can be skipped.

wikiurl = 'https://en.wikipedia.org/wiki/Wind_power_by_country'
tables = pd.read_html(wikiurl)
df = pl.DataFrame(tables[4])
df

Due to the high number of rows, I will focus on countries with the Cap. (GW) value higher than 5. The first row that shows the World data will also be removed. The following code shows how to filter the Polars DataFrame.

If you want to select other columns or filter with other values, please feel free to modify the code below.

no_list = ['World']
df = df.filter(pl.col('Cap. (GW)') > 6)
df = df.filter(~pl.col('Country').is_in(no_list))
df

Displaying Polars DataFrame with Great Tables

Now that the Polars table is ready, let's try to display the table using the Great Tables package.

gt_df = GT(df)
gt_df

Polars table showing Wind power generation by country 2023 data using Great Tables. Image by author.

Next, let's do some basic modifications such as adding a title and making the maximum value in the % cap. growth column bold.

list_cap = list(df['% cap. growth'])
max_idx = str(list_cap.index(max(list_cap)))   ## Get the maximum value

str_txt = 'gt_df
.tab_header(title = "Wind power generation by country 2023")
.tab_style(style.text(weight = "bold", color="black"), 
loc.body("% cap. growth", '+ max_idx + '))'

tb = eval(str_txt)
tb

Adding a title and making the maximum value in a column bold. Image by author.

Coloring Polars DataFrame with Great Tables

For coloring the table, we need to create a color list from a color palette. As shown in the following code, this article will use the ‘summer‘ palette. Other palettes such as ‘coolwarm‘ or ‘viridis‘ can be used as well.

The number of colors extracted is 101 since in the next step we will scale the min-max values in a column to 0–100. Then, the obtained color list is enumerated to create a dictionary for use later.

import seaborn as sns
colors = list(sns.color_palette(palette='summer_r', n_colors=101).as_hex())
dict_colors = dict(enumerate(colors))

In the next step, we will scale the values in the % cap. growth column. The minimum value is 0 and the maximum value is 100. After that, the color code is assigned to each scaled value using the color dictionary.

n_cap = max(list_cap) - min(list_cap)

percentage_cap = [int((i-min(list_cap))*100/n_cap) for i in list_cap] 
colors_cap = [dict_colors.get(p) for p in percentage_cap]

Here comes the coloring process, the for-loop function will be applied to create multiple text codes. Each code is used for assigning a color to each row based on the color dictionary. After that, every created text is joined as one text code for running.

Ta-daaaaa!!

Polars table after styling and coloring to a column using Great Tables. Image by author.

From the result, we can easily spot that the Netherlands has the highest % cap. growth in 2023. The dark green area also helps us see other countries that have high values close to the Netherlands. On the contrary, the bright yellow area tells us where low values are.

Now let's apply the same process with the % gen column to show the country that has the highest percentage in this category.

We can quickly notice that Denmark has the highest value in the % gen column. With the colors in this column, we can tell that there are no other countries that have high % gen values close to Denmark since there are no other dark green colors shown up.

Now that you have seen the steps for coloring the DataFrame and the obtained results. Next, let's apply the same concept to the other columns.

Then, we will talk about the benefits of adding color to the table.

Lastly, after applying the same method to every column, we will get a table that looks like this one.

Voilà…!!

Polars table after styling and fully coloring using Great Tables. Image by author.

From the result, it can be seen that highlighting the table facilitates us in navigating the table. The highest value in each column can be quickly noticed. Adding color scale also helps us locate where the high and low values are located.

Moreover, the colors make the table look more interesting compared with the original one.

Summary

The Great Tables package can be applied directly to customize Polars table. This results in not only allowing us to work smoother since there is no need to convert the table, but also letting us continue with high-speed code execution which is a big advantage of the Polars library.

If you have any comments or recommendations, please feel free to share.

Thanks for reading.

If you are looking for some Data Visualization ideas, I mainly write articles about how to make interesting charts using Python. These are some of my articles that you may find interesting:

7 Visualizations with Python to handle Multivariate Categorical data (link)
8 Visualizations with Python to handle Multiple Time-Series data (link)
7 Visualizations with Python to Express Changes in Rank over Time (link)
9 Visualizations with Python that Catch More Attention than a Bar Chart (link)
9 Visualizations with Python to show Proportions or Percentages instead of a Pie chart (link)
Data Visualization Cheat Sheet for Basic Machine Learning Algorithms (link)

References

Polars – updated TPC-H Benchmark results. Polars. (2024, April 16). https://pola.rs/posts/benchmarks/
Wikimedia Foundation. (2024, June 3). Wind power by country. Wikipedia. https://en.wikipedia.org/wiki/Wind_power_by_country
Chow, M. (2024, January 8). Great tables: The Polars Dataframe styler of your dreams. Great Tables: The Polars DataFrame Styler of Your Dreams – great_tables. https://posit-dev.github.io/great-tables/blog/polars-styling/

Tags: Data Science Data Visualization Polars Dataframe Python Tips And Tricks