These 5 SQL Techniques Cover ~80% of Real-Life Projects

Author:Murphy  |  View: 29609  |  Time: 2025-03-23 19:29:11
Photo by Possessed Photography on Unsplash

Are you curious about SQL but hesitant to dive in? Or maybe you're already familiar with the basics of SQL but struggle to apply it to real-life projects.

I know the feeling.

When I first started learning SQL, I was intimidated by the vast amount of information. And even today, I still constantly learn and explore new techniques.

Sure, the basics of SQL, like joins, subqueries, filtering, and ordering, are easy to pick up. But you need advanced techniques for complex real-world problems.

In this post, I want to share the five advanced SQL techniques I use the most in my everyday work. By mastering these techniques, you'll be able to accomplish almost 80% of production-grade SQL queries, making you a valuable asset to any Data-driven projects.

Python To SQL – I Can Now Load Data 20X Faster

I have intentionally not included some other frequently used techniques, such as transactions. This list of methods will be very helpful if you're in an analytical role than a software engineer role.

Throughout the post, I assumed we were in a Postgres database. But every major relational database nowadays offers similar functionality.

1. Window functions

A window function is an analytic function that performs calculations across a set of rows related to the current row. The result of a window function is returned alongside the original rows in the result set without changing the underlying data.

Junior Developers Write Multi-Page SQL Queries; Seniors Use Window Functions

A real-life example of a window function might be calculating the running total of sales revenue for a particular product over time. This could be useful for identifying trends in sales, such as which products are most popular at certain times of the year.

Here's an example of how to use the SUM window function to calculate the running total of sales revenue for a specific product over time:

Sql">SELECT
  product_id,
  date,
  SUM(revenue) OVER (PARTITION BY product_id ORDER BY date) AS running_total
FROM
  sales
WHERE
  product_id = 123;

In this example, the SUM window function is used to calculate the running total of revenue for a particular product_id over time. The PARTITION BY clause groups the data by product_id, and the ORDER BY clause sorts the data by date. The running_total column contains the result of the SUM window function.

Why not use group by?

This confused me when I first started using window functions.

Yes, you can use GROUP BY to aggregate data in PostgreSQL. Yet, using GROUP BY would provide a different result than a window function.

In the example of calculating the running total of sales revenue for a specific product over time, using GROUP BY would group the sales data by product_id and date, and then calculate the sum of income for each group. This would give you the total revenue for each day rather than the running total of revenue over time.

Here's an example of using GROUP BY to aggregate sales data by product_id and date:

SELECT
  product_id,
  date,
  SUM(revenue) AS total_revenue
FROM
  sales
GROUP BY
  product_id,
  date
WHERE
  product_id = 123;

Learn more about SQL window functions

2. CTE: Common table expression

A CTE is a temporary named result set you can reference within a single SQL statement. It defines a subquery that can be referenced multiple times within a gigantic query, thus simplifying complex queries.

Let's take an example to understand this better. Suppose you have a table that contains all the customer orders. You want to find the sales by product for top-performing regions.

Without a CTE, you must write a complex query involving subqueries, joins, and aggregation functions. This could make the query challenging to read and understand. However, by using a CTE, you can simplify the query and make it more readable.

Here's a classic example to understand the usefulness of CTEs.

WITH regional_sales AS (
    SELECT region, SUM(amount) AS total_sales
    FROM orders
    GROUP BY region
), top_regions AS (
    SELECT region
    FROM regional_sales
    WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
)
SELECT region,
       product,
       SUM(quantity) AS product_units,
       SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;

In the above query, we use two CTE to calculate the top-selling products in the top-performing regions.

The first CTE, regional_sales, calculates the total sales for each region by summing the amount column of the orders table and grouping the results by region.

The second CTE, top_regions, selects only the regions whose total sales are greater than 10% of the total sales across all regions. This is done using a subquery that calculates the total sales across all regions and divides it by 10.

The main query then joins the orders table with the top_regions CTE using an IN clause to filter the results to only include orders from the top-performing regions.

Here's the rewritten query without using a CTE:

SELECT region,
       product,
       SUM(quantity) AS product_units,
       SUM(amount) AS product_sales
FROM orders
WHERE region IN (
    SELECT region
    FROM (
        SELECT region, SUM(amount) AS total_sales
        FROM orders
        GROUP BY region
    ) regional_sales
    WHERE total_sales > (
        SELECT SUM(total_sales)/10
        FROM (
            SELECT region, SUM(amount) AS total_sales
            FROM orders
            GROUP BY region
        ) regional_sales_sum
    )
)
GROUP BY region, product;

A CTE can simplify complex queries and make them more readable. It makes reusing the same subquery multiple times within a larger query easier.

Learn more about CTEs

3. Recursive query

Have you ever wanted to retrieve data from a database where the data is stored in a hierarchical or tree-like structure?

For instance, you may have a product category tree where each category has subcategories, and each subcategory can have further subcategories. In such cases, a recursive query can be handy.

A recursive query is a query that references itself in the definition. It's useful when traversing a tree or a hierarchical structure in the database and retrieving all the related data. In other words, it enables you to select data from a table that depends on data from the same table.

Here's an example of a recursive query that can be used to traverse a category tree:

WITH RECURSIVE category_tree(id, name, parent_id, depth, path) AS (
  SELECT id, name, parent_id, 1, ARRAY[id]
  FROM categories
  WHERE parent_id IS NULL
  UNION ALL
  SELECT categories.id, categories.name, categories.parent_id, category_tree.depth + 1, path || categories.id
  FROM categories
  JOIN category_tree ON categories.parent_id = category_tree.id
)
SELECT id, name, parent_id, depth, path
FROM category_tree;

We use a CTE with the WITH clause to define the recursive query in this example. The RECURSIVE keyword tells Postgres that it's a recursive query.

The category_tree CTE is defined by two SELECT statements. The first SELECT statement selects the root nodes of the category tree (nodes with no parent), and the second SELECT statement selects the child nodes recursively. The UNION ALL operator combines the results of both SELECT statements.

The depth column is used to keep track of the depth of each category node in the tree. The path column is an array that stores the path from the root to the current node.

With this query, we can retrieve all the categories in the tree with their respective depths and paths.

Learn more about Recursive queries

4. Dynamic SQL

If you've ever worked with SQL queries, you might have encountered some that are pretty complex and need to be generated at run time. Writing such queries can be daunting, and executing them can be even more challenging.

In the past, I used to rely on Python to generate complex SQL queries and execute them using a database connector such as psycopg2. This approach was effective but not very elegant.

However, I recently discovered dynamic SQL in Postgres, making generating and executing complex queries much more manageable. With dynamic SQL, you can dynamically create queries based on runtime conditions, which is incredibly useful when dealing with complex data structures or business logic.

Let's say you want to retrieve all orders placed on a specific date. In a static query, you might write something like this:

SELECT * FROM orders WHERE order_date = '2022-03-01';

But what if you allow the user to select a date range? With dynamic SQL, you can generate the query based on user input like this:

EXEC SQL BEGIN DECLARE SECTION;
const char *stmt = "SELECT * FROM orders WHERE order_date BETWEEN ? AND ?";
DATE start_date, end_date;
EXEC SQL END DECLARE SECTION;

EXEC SQL PREPARE mystmt FROM :stmt;

EXEC SQL EXECUTE mystmt USING :start_date, :end_date;

In this example, we've created a function that takes two parameters, start_date and end_date, and returns a table of orders that fall within that date range. The EXECUTE statement allows us to generate the query based on the input parameters dynamically, and the USING clause specifies the values for the query.

This example is rudimentary. But in large-scale projects, you'd need lots of SQL generated dynamically.

Learn more about Dynamic SQL.

5. Cursors

Our queries may run on constrained environments. Running an intensive operation on a vast table all at once may not always be preferable. Or we may need more control over the operation than applying it to a full table.

This is where cursors come in handy. Cursors allow you to retrieve and manipulate data from a result set one row at a time. You can use cursors to iterate through a data set and perform complex operations on each row.

Suppose you have a table called "products" that contains information about all products, including the product ID, product name, and current inventory. You can use a cursor to iterate through all orders that contain a specific product and update its inventory.

DECLARE
    cur_orders CURSOR FOR 
        SELECT order_id, product_id, quantity
        FROM order_details
        WHERE product_id = 456;

    product_inventory INTEGER;
BEGIN
    OPEN cur_orders;
    LOOP
        FETCH cur_orders INTO order_id, product_id, quantity;
        EXIT WHEN NOT FOUND;
        SELECT inventory INTO product_inventory FROM products WHERE product_id = 456;
        product_inventory := product_inventory - quantity;
        UPDATE products SET inventory = product_inventory WHERE product_id = 456;
    END LOOP;
    CLOSE cur_orders;

    -- do something after updating the inventory, such as logging the changes
END;

In this example, we first declare a cursor called "cur_orders" that selects all order details that contain a specific product ID. We then define a variable called "product_inventory" to store the current inventory of the product.

Inside the loop, we fetch each order ID, product ID, and quantity from the cursor, subtract the quantity from the current inventory and update the products table with the new inventory value.

Finally, we close the cursor and do something after updating the inventory, such as logging the changes.

Conclusion

In conclusion, SQL is a powerful language that offers many techniques to work with complex data. But learning them all might intimidate you at first.

This blog post explored five of the most commonly used advanced SQL techniques, including CTE, Window function, Recursive queries, dynamic queries, and cursors. While basic SQL concepts such as joins and subqueries are fundamental to working with data, these techniques will help you work on almost any SQL project.

While this post provides an overview of the advanced SQL techniques, it is not intended to be an exhaustive discussion of each technique. However, relevant links have been provided for those who wish to explore these concepts further. In the future, I plan to delve deeper into each of these techniques individually, providing a more comprehensive understanding of their capabilities and potential applications.


Thanks for reading, friend! If you enjoyed my article, let's keep in touch on LinkedIn, Twitter, and Medium.

Not a Medium member yet? Please use this link to become a member because, at no extra cost for you, I earn a small commission for referring you.

Tags: Data Data Science Programming Software Engineering Sql

Comment