Junior Developers Write Multi-Page SQL Queries; Seniors Use Window Functions

Author:Murphy  |  View: 24264  |  Time: 2025-03-23 19:26:38
Photo by R Mo on Unsplash

Imagine a situation where you run a nationwide electronic store. You must determine how much each store contributes to the state's sales.

If you're starting on SQL, you'd think for a bit and develop an incredibly amazing query that may look like the following.

SELECT
 s.state,
 s.store_name,
 s.total_sales,
 s.total_sales / t.state_total_sales AS sales_contribution
FROM
 (
 SELECT
  state,
  store_name,
  SUM(sales) AS total_sales
 FROM
  store_sales
 GROUP BY
  state,
  store_name
) s
INNER JOIN (
 SELECT
  state,
  SUM(sales) AS state_total_sales
 FROM
  store_sales
 GROUP BY
  state
) t ON
 s.state = t.state
ORDER BY
 s.state,
 sales_contribution DESC;
|state|store_name|total_sales|sales_contribution|
|-----|----------|-----------|------------------|
|California|MegaMart|82500.00|0.61797752808988764045|
|California|ABC Mart|51000.00|0.38202247191011235955|
|New York|SuperMart|57000.00|0.60638297872340425532|
|New York|Corner Shop|37000.00|0.39361702127659574468|
|Texas|XYZ Store|35000.00|0.67961165048543689320|
|Texas|My Store|16500.00|0.32038834951456310680|

This gets the job done for sure. But if you explain this to others, they'll have a nightmarish time with you.

Instead, if you ask to do the same from a senior developer, they will write a query like this:

SELECT
 state,
 store_name,
 SUM(sales) AS total_sales,
 SUM(sales) / SUM(SUM(sales)) OVER (PARTITION BY state) AS sales_contribution
FROM
 store_sales
GROUP BY
 state,
 store_name
ORDER BY
 state,
 sales_contribution DESC;

This much more concise version uses the window functions.

The window function creates a window of records related to the current record and operates within the window. In this example, the window is the state.

My first encounter with the window function occurred a few years ago when I worked for a retail client. Their customer dataset is vast, and they want to see the changing buying pattern of each cohort of customers – as in those born during the 80s and 90s, etc.

Python To SQL – I Can Now Load Data 20X Faster

These 5 SQL Techniques Cover ~80% of Real-Life Projects

Since then, window functions have become my favorite, and I use them in almost every analytical project at least once. That's because we often want to perform calculations within the context of the record.

This post is a gentle introduction to window functions in Sql. Most importantly, we need to understand how we define the window and what operation we perform on the window.

If you're going to follow along with examples in this post, let's create a table and insert some dummy values into it.

CREATE TABLE store_sales (
  store_id INT,
  store_name VARCHAR(50),
  state VARCHAR(50),
  month DATE,
  sales DECIMAL(10, 2)
);

INSERT INTO store_sales VALUES
(1, 'ABC Mart', 'California', '2022-01-01', 10000.00),
(2, 'XYZ Store', 'Texas', '2022-01-01', 7500.00),
(3, 'Corner Shop', 'New York', '2022-01-01', 12000.00),
(4, 'MegaMart', 'California', '2022-01-01', 25000.00),
(5, 'My Store', 'Texas', '2022-01-01', 5000.00),
(6, 'SuperMart', 'New York', '2022-01-01', 18000.00),
(7, 'ABC Mart', 'California', '2022-02-01', 15000.00),
(8, 'XYZ Store', 'Texas', '2022-02-01', 9000.00),
(9, 'Corner Shop', 'New York', '2022-02-01', 10000.00),
(10, 'MegaMart', 'California', '2022-02-01', 30000.00),
(11, 'My Store', 'Texas', '2022-02-01', 6000.00),
(12, 'SuperMart', 'New York', '2022-02-01', 20000.00),
(13, 'ABC Mart', 'California', '2022-03-01', 12000.00),
(14, 'XYZ Store', 'Texas', '2022-03-01', 10500.00),
(15, 'Corner Shop', 'New York', '2022-03-01', 15000.00),
(16, 'MegaMart', 'California', '2022-03-01', 27500.00),
(17, 'My Store', 'Texas', '2022-03-01', 5500.00),
(18, 'SuperMart', 'New York', '2022-03-01', 19000.00),
(19, 'ABC Mart', 'California', '2022-04-01', 14000.00),
(20, 'XYZ Store', 'Texas', '2022-04-01', 8000.00);

The basics of SQL window functions

Here's a query to display each state's average sales and total sales value for each store.

SELECT 
    store_name,
    state,
    SUM(sales) AS total_sales,
    AVG(SUM(sales)) OVER (PARTITION BY state) AS state_average
FROM 
    store_sales
GROUP BY 
    store_name, state;
|store_name|state|total_sales|state_average|
|----------|-----|-----------|-------------|
|ABC Mart|California|51000.00|66750.000000000000|
|MegaMart|California|82500.00|66750.000000000000|
|Corner Shop|New York|37000.00|47000.000000000000|
|SuperMart|New York|57000.00|47000.000000000000|
|XYZ Store|Texas|35000.00|25750.000000000000|
|My Store|Texas|16500.00|25750.000000000000|

Let's closely examine the window function.

Illustration by the author.

Every window function has these two parts. We define the window following the OVER keyword. In this example, we only partition the dataset using the state column. The operation will be performed among the records that share the same state.

Further, you can rearrange the widow records using the ORDER BY keyword. The following query uses it to get the rank of each store within its state.

SELECT
 store_name,
 state,
 sales,
 DENSE_RANK() OVER (PARTITION BY state
ORDER BY
 sales DESC) AS store_sales_rank
FROM
 store_sales;
|store_name|state|sales|store_sales_rank|
|----------|-----|-----|----------------|
|MegaMart|California|30000.00|1|
|MegaMart|California|27500.00|2|
|MegaMart|California|25000.00|3|
|ABC Mart|California|15000.00|4|
|ABC Mart|California|14000.00|5|
|ABC Mart|California|12000.00|6|
|ABC Mart|California|10000.00|7|
|SuperMart|New York|20000.00|1|
|SuperMart|New York|19000.00|2|
|SuperMart|New York|18000.00|3|
|Corner Shop|New York|15000.00|4|
|Corner Shop|New York|12000.00|5|
|Corner Shop|New York|10000.00|6|
|XYZ Store|Texas|10500.00|1|
|XYZ Store|Texas|9000.00|2|
|XYZ Store|Texas|8000.00|3|
|XYZ Store|Texas|7500.00|4|
|My Store|Texas|6000.00|5|
|My Store|Texas|5500.00|6|
|My Store|Texas|5000.00|7|

In place of DENSE_[RANK](https://www.sqlshack.com/overview-of-sql-rank-functions/), you can use RANK or ROW_NUMBER. The difference between the three keywords is in the way they handle ties.

ROW_NUMBER will assign a sequential number for tie and gives no importance to ties. RANK will assign the same rank for ties and skip the next one. For instance, if two stores have the same sales values, they both will get number 1. But number 2 will be skipped, and the next in line gets number 3. DENSE_RANK will also assign the same number to ties but won't skip the next number. The next record will get the immediately following rank.

Interesting ways we can use window functions

Window functions have lots of exciting applications. As I said earlier, I use it in almost every SQL project I work on. Here are some common ways we can use window functions for.

Since we've already looked into ranking in the basics example, we're not revisiting that here. But ranking is one of the most common use cases of a window function.

1. Calculating running totals

We may also want to calculate a running total wherever we have temporal data. In other words, we should sum all the previous values to a certain point.

In our store sales example, this might be how much each store has sold since the beginning of the year by every month's end. Here's a query that accomplishes this task.

SELECT
 store_name,
 MONTH,
 sales,
 SUM(sales) OVER (PARTITION BY store_name
ORDER BY
 "month") AS running_total
FROM
 store_sales;
|store_name|month|sales|running_total|
|----------|-----|-----|-------------|
|ABC Mart|2022-01-01|10000.00|10000.00|
|ABC Mart|2022-02-01|15000.00|25000.00|
|ABC Mart|2022-03-01|12000.00|37000.00|
|ABC Mart|2022-04-01|14000.00|51000.00|
|Corner Shop|2022-01-01|12000.00|12000.00|
|Corner Shop|2022-02-01|10000.00|22000.00|
|Corner Shop|2022-03-01|15000.00|37000.00|
|MegaMart|2022-01-01|25000.00|25000.00|
|MegaMart|2022-02-01|30000.00|55000.00|
|MegaMart|2022-03-01|27500.00|82500.00|
|My Store|2022-01-01|5000.00|5000.00|
|My Store|2022-02-01|6000.00|11000.00|
|My Store|2022-03-01|5500.00|16500.00|
|SuperMart|2022-01-01|18000.00|18000.00|
|SuperMart|2022-02-01|20000.00|38000.00|
|SuperMart|2022-03-01|19000.00|57000.00|
|XYZ Store|2022-01-01|7500.00|7500.00|
|XYZ Store|2022-02-01|9000.00|16500.00|
|XYZ Store|2022-03-01|10500.00|27000.00|
|XYZ Store|2022-04-01|8000.00|35000.00|

The above query will order the records in the window with the month column. In any month, sales figures are accumulated only up to that month.

2. Comparing to a group statistic.

There may be situations where we need to compare each record to its group average. For instance, we may be interested in seeing each store's state averages.

Here's the SQL query for this.

SELECT
 store_name,
 state ,
 MONTH,
 sales,
 AVG(sales) OVER (PARTITION BY state, "month") AS running_total
FROM
 store_sales;
|store_name|state|month|sales|running_total|
|----------|-----|-----|-----|-------------|
|MegaMart|California|2022-01-01|25000.00|17500.000000000000|
|ABC Mart|California|2022-01-01|10000.00|17500.000000000000|
|ABC Mart|California|2022-02-01|15000.00|22500.000000000000|
|MegaMart|California|2022-02-01|30000.00|22500.000000000000|
|ABC Mart|California|2022-03-01|12000.00|19750.000000000000|
|MegaMart|California|2022-03-01|27500.00|19750.000000000000|
|ABC Mart|California|2022-04-01|14000.00|14000.0000000000000000|
|SuperMart|New York|2022-01-01|18000.00|15000.000000000000|
|Corner Shop|New York|2022-01-01|12000.00|15000.000000000000|
|Corner Shop|New York|2022-02-01|10000.00|15000.000000000000|
|SuperMart|New York|2022-02-01|20000.00|15000.000000000000|
|Corner Shop|New York|2022-03-01|15000.00|17000.000000000000|
|SuperMart|New York|2022-03-01|19000.00|17000.000000000000|
|XYZ Store|Texas|2022-01-01|7500.00|6250.0000000000000000|
|My Store|Texas|2022-01-01|5000.00|6250.0000000000000000|
|My Store|Texas|2022-02-01|6000.00|7500.0000000000000000|
|XYZ Store|Texas|2022-02-01|9000.00|7500.0000000000000000|
|My Store|Texas|2022-03-01|5500.00|8000.0000000000000000|
|XYZ Store|Texas|2022-03-01|10500.00|8000.0000000000000000|
|XYZ Store|Texas|2022-04-01|8000.00|8000.0000000000000000|

3. Calculating moving averages

Moving averages are so typical when working with time series data. Moving averages usually have little noise than the individual data points. You'd always see it in financial data analysis, such as stock market data. But we can find similar applications in any domain.

Here's the SQL query that does it for our store sales data. It computes, for each store, the 3-point moving average.

SELECT
 store_name ,
 MONTH,
 sales,
 AVG(sales) OVER (PARTITION BY store_name
ORDER BY
 MONTH ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_sales
FROM
 store_sales;
|store_name|month|sales|moving_avg_sales|
|----------|-----|-----|----------------|
|ABC Mart|2022-01-01|10000.00|10000.0000000000000000|
|ABC Mart|2022-02-01|15000.00|12500.0000000000000000|
|ABC Mart|2022-03-01|12000.00|12333.3333333333333333|
|ABC Mart|2022-04-01|14000.00|13666.666666666667|
|Corner Shop|2022-01-01|12000.00|12000.0000000000000000|
|Corner Shop|2022-02-01|10000.00|11000.0000000000000000|
|Corner Shop|2022-03-01|15000.00|12333.3333333333333333|
|MegaMart|2022-01-01|25000.00|25000.000000000000|
|MegaMart|2022-02-01|30000.00|27500.000000000000|
|MegaMart|2022-03-01|27500.00|27500.000000000000|
|My Store|2022-01-01|5000.00|5000.0000000000000000|
|My Store|2022-02-01|6000.00|5500.0000000000000000|
|My Store|2022-03-01|5500.00|5500.0000000000000000|
|SuperMart|2022-01-01|18000.00|18000.0000000000000000|
|SuperMart|2022-02-01|20000.00|19000.000000000000|
|SuperMart|2022-03-01|19000.00|19000.000000000000|
|XYZ Store|2022-01-01|7500.00|7500.0000000000000000|
|XYZ Store|2022-02-01|9000.00|8250.0000000000000000|
|XYZ Store|2022-03-01|10500.00|9000.0000000000000000|
|XYZ Store|2022-04-01|8000.00|9166.6666666666666667|

Look carefully at how we define the window. In addition to the usually appearing PARTITION BY and ORDER BY keywords, we use a few others. We tell SQL to consider only the 2 preceding and current records. By changing the parameter, you can even calculate different point moving averages.

Likewise, you can also compute forward-facing moving averages. Here's the SQL query for this:

SELECT
 store_name ,
 MONTH,
 sales,
 AVG(sales) OVER (PARTITION BY store_name 
ORDER BY
 MONTH ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS moving_avg_sales
FROM
 store_sales;
|store_name|month|sales|moving_avg_sales|
|----------|-----|-----|----------------|
|ABC Mart|2022-01-01|10000.00|12333.3333333333333333|
|ABC Mart|2022-02-01|15000.00|13666.666666666667|
|ABC Mart|2022-03-01|12000.00|13000.0000000000000000|
|ABC Mart|2022-04-01|14000.00|14000.0000000000000000|
|Corner Shop|2022-01-01|12000.00|12333.3333333333333333|
|Corner Shop|2022-02-01|10000.00|12500.0000000000000000|
|Corner Shop|2022-03-01|15000.00|15000.0000000000000000|
|MegaMart|2022-01-01|25000.00|27500.000000000000|
|MegaMart|2022-02-01|30000.00|28750.000000000000|
|MegaMart|2022-03-01|27500.00|27500.000000000000|
|My Store|2022-01-01|5000.00|5500.0000000000000000|
|My Store|2022-02-01|6000.00|5750.0000000000000000|
|My Store|2022-03-01|5500.00|5500.0000000000000000|
|SuperMart|2022-01-01|18000.00|19000.000000000000|
|SuperMart|2022-02-01|20000.00|19500.000000000000|
|SuperMart|2022-03-01|19000.00|19000.0000000000000000|
|XYZ Store|2022-01-01|7500.00|9000.0000000000000000|
|XYZ Store|2022-02-01|9000.00|9166.6666666666666667|
|XYZ Store|2022-03-01|10500.00|9250.0000000000000000|
|XYZ Store|2022-04-01|8000.00|8000.0000000000000000|

Conclusion

Window functions have become my favorite after realizing their usefulness. Thus through this post, I shared the basics and some frequently used window function applications.

But you can do a lot more with window functions. Although not extensively covered in this blog post, the basics should let you write amazing SQL statements.


Thanks for reading, friend! If you enjoyed my article, let's keep in touch on LinkedIn, Twitter, and Medium.

Not a Medium member yet? Please use this link to become a member because, at no extra cost for you, I earn a small commission for referring you.

Tags: Data Data Science Programming Software Engineering Sql

Comment