On using intermediary results in DAX Measures

Author:Murphy  |  View: 28273  |  Time: 2025-03-23 19:23:51

We use table variables in DAX all the time. But what when we need to calculate intermediary results and reuse them later in a DAX Measure? This challenge sounds easy, but it's not.

Photo by Mika Baumeister on Unsplash

Introduction

We are using intermediary table variables all the time in Dax.

For example, look at the following Measure:

[SalesYTD] =
  VAR YTDDates = DATESYTD('Date'[Date])

RETURN
  CALCULATE(
      [Sum Online Sales]
      ,YTDDates
      )

In this case, we generate a Year-To-Date table based on the actual Filter context with the help of the DATESYTD() function containing all the dates from January 1st of the Year in the actual Filter Context up to the actual date (Based on the current Filter Context of course).

But sometimes, we need to do more.

For example, we need to query a table based on the current filter context and perform further calculations on the result.

In this case, we must generate an intermediary table and assign the result to a variable inside the Measure to perform the needed calculations.

As we need the Filter Context, we cannot pre-create a table with these intermediary results, as this table would be huge to accommodate all possible filter combinations.

So, let's look at how we can do it.

Base Query

The Base query is the following:

EVALUATE
  ADDCOLUMNS(
        VALUES(Customer[CustomerKey])
          ,"AverageSales", CALCULATE( AVERAGEX( 'Online Sales'
          ,('Online Sales'[UnitPrice] * 'Online Sales'[SalesQuantity])-'Online Sales'[DiscountAmount]
          )
        )
      )
ORDER BY [AverageSales] DESC

The (truncated) result of this query is the following:

Figure 1 – Result of Base Query (Figure by the Author)

But what happens if this is only a starting point, and we need to work on this result in a Measure to perform further calculations?

For example, we want to display the sum of all Rows for each Month.

In this case, we would need to calculate the Average Sales Amount for each Month and sum the result.

Let's see how I would create a solution intuitively, which doesn't work, and, afterward, how the solution will work.

How not to do it – or how it doesn't work

The first intuitive way to solve the requirement above would be to generate a Variable with the ADDCOLUMNS() function to pre-calculate the table with the Average Sales. Then aggregate the rows to get the needed result:

DEFINE
  MEASURE 'All Measures'[AverageSalePerCustomer] =
    VAR AverageSalesPerCust =
            ADDCOLUMNS (
                VALUES ( Customer[CustomerKey] ),
                "AverageSales",
                -- Calculate the Average Sales per Customer using the Filter Context provided by ADDCOLUMNS()
                CALCULATE (
                  AVERAGEX (
                    'Online Sales',
                    ( 'Online Sales'[UnitPrice] * 'Online Sales'[SalesQuantity] ) - 'Online Sales'[DiscountAmount]
                    )
                  )
                )
RETURN
  -- Calculate a sum of all the rows calculated in the previous step
  SUM ( AverageSalesPerCust[AverageSales] )

EVALUATE
  -- Get the list of all Months and call the Measures defined above for each month
  ADDCOLUMNS (
      SUMMARIZECOLUMNS (
            'Date'[Year Month Short Name] 
            ),
            "AverageSalePerCustomer", [AverageSalePerCustomer]
      )
ORDER BY 'Date'[Year Month Short Name]

After executing this Query in DAX Studio, I get the following error message:

Figure 2 – Error Message with the first try (Figure by the Author)

The issue is that SUM() cannot work with table variables as we need it.

Let's try it in a slightly different way.

Let's replace SUM() with SUMX() after the RETURN statement:

SUMX(AverageSalesPerCust
    ,[AverageSales])

The error message is the following:

Figure 3 – Error message with the second approach (Figure by the Author)

So, SUMX() cannot access columns in table variables either.

I tried other methods to get the needed result:

  • Use CALCULATETABLE() to generate the table which can be used by SUM() or SUMX() But this function can also not access the column in the table variable.

  • Use of FILTER() to generate the table and use it in SUM() While FILTER() had no problem, SUM() couldn't access the table variable.

But wait … FILTER() doesn't generate any error.

Can I use FILTER() with SUMX()?

Let's see if this approach works.

Working Solution – FILTER() and SUMX()

The first working approach is the following Query:

DEFINE
  MEASURE 'All Measures'[AverageSalePerCustomer] =
    VAR AverageSalesPerCust =
      ADDCOLUMNS (
        VALUES ( Customer[CustomerKey] ),
        "AverageSales",
        CALCULATE (
            AVERAGEX (
              'Online Sales',
              ( 'Online Sales'[UnitPrice] * 'Online Sales'[SalesQuantity] ) - 'Online Sales'[DiscountAmount]
              )
            )
          )

    VAR AvgSalesOver0 =
      -- Wrap the intermediary table variable in FILTER() to make it usable by aggregation function
      FILTER (
          AverageSalesPerCust
          ,[AverageSales] > 0
          )

RETURN
  SUMX (
      AvgSalesOver0,
      [AverageSales]
      )

EVALUATE
  ADDCOLUMNS (
      SUMMARIZECOLUMNS (
            'Date'[Year Month Short Name]
            ),
            "AverageSalePerCustomer", [AverageSalePerCustomer]
        )
  -- Order by the calculated column
  ORDER BY [Year Month Short Name] DESC

The first table variable AverageSalesPerCust is still the same.

The second step is to use the FILTER() function to define the new table variable AvgSalesOver0.

UPDATE Januar 2024: As stated in the Comments by AlexisOlson, FILTER() shouldn't be needed. In another case, I used the same technique without FILTER(), which worked. At the moment, I don't know why I need FILTER() here.

In this case, I use the FILTER() function to convert the table variable AverageSalesPerCust into a table variable usable by aggregation functions.

As FILTER() needs at least two parameters, I have to add the "Filter" [AverageSales] > 0 to make sure that the FILTER function works.

Surprisingly, SUMX has no problems accessing a table variable constructed with the FILTER() function, and I can use this table variable to calculate the aggregation.

The (truncated) result of the query looks like this:

Figure 4 – Result of working solution (Figure by the Author)

But the execution needs some time.

When I look at the performance metrics, I notice some issues:

Figure 5 – Performance metrics for the first solution (Figure by the Author)

The total execution time is over three seconds, while over two seconds are spent in Formula Engine (FE) as it needs to process through three million rows.

This needs to be more efficient, and we must try to find a more efficient solution.

Optimization attempt with CALCULATETABLE()

I combined CALCULATETABLE() with FILTER() to get the following solution:

DEFINE
  MEASURE 'All Measures'[AverageSalePerCustomer] =
        VAR AverageSalesPerCust =
          CALCULATETABLE (
            ADDCOLUMNS (
                VALUES ( Customer[CustomerKey] ),
                "AverageSales",
                CALCULATE (
                  AVERAGEX (
                    'Online Sales',
                      ( 'Online Sales'[UnitPrice] * 'Online Sales'[SalesQuantity] ) - 'Online Sales'[DiscountAmount]
                    )
                  )
                )
                -- Add a Dummy filter
                ,Customer[CustomerKey] <> 0
              )

VAR AvgSalesOver0 =
    FILTER (
        AverageSalesPerCust,
        [AverageSales] > 0
        )

RETURN
  SUMX (
      AvgSalesOver0
      ,[AverageSales]
      )

EVALUATE
  ADDCOLUMNS (
    SUMMARIZECOLUMNS (
            'Date'[Year Month Short Name]
            ),
            "AverageSalePerCustomer", [AverageSalePerCustomer]
          )
  -- Order by the calculated column is possible
  ORDER BY [Year Month Short Name] DESC

The difference is that I enclosed ADDCOLUMNS() with a CALCULATETABLE() function.

I added a Dummy filter, hoping the execution plan would change.

But this version keeps everything the same.

Let's try something else.

Optimization attempt with a Dummy Filter

The next idea was to replace the Predicate in the FILTER function with a Dummy filter:

VAR AvgSalesOver0 =
    FILTER (
          AverageSalesPerCust
          -- Dummy-Filter 
          ,1 = 1
          )

The assumption was that a predicate without reference to a column (1=1) could lead to more efficient execution.

Unfortunately, this didn't change the outcome.

Is no optimization possible or necessary?

I tried other table functions to construct the first table variable, with the average sales per customer, but everything stayed the same.

After some thought, I realized this issue could not be solved within DAX.

The Storage Engine (SE) is not able to work how we would need it, or I wasn't able to find the correct solution.

Therefore, the DAX engine will resort to the capabilities of the Formula Engine, thus loading the three million rows into memory and aggregating the data there.

Sometimes there is no better solution than the one we've found.

And three seconds is the maximum upper limit of what users are willing to wait for a result.

We can consider this solution as good enough.

But wait a moment. I execute the Measure with the entire dataset containing data from over ten years. Is this a realistic scenario?

A more realistic scenario is that the user will analyze only one or two years of data.

With a filter to restrict the query to only one year, the execution time plummets to less than two tents of a second.

Figure 6 – Performance Metrics for selecting only one year (Figure by the Author)

As this will be a common scenario in a report, the Solution is ready to use in reports.

Lessons learned: Remember to use real-world scenarios and use cases when testing the performance of a complex measure.

Photo by Lucas Santos on Unsplash

Template for the Solution

Based on these results, the template for the Solution looks like this:

DEFINE
  MEASURE 'All Measures'[AverageSalePerCustomer] =
    VAR  =
          

    VAR  =
        FILTER(
            
            ,1=1 - Dummy Filter
            )
RETURN
   >

Alternatively, we can use this form:

DEFINE
MEASURE 'All Measures'[AverageSalePerCustomer] =
    VAR  =
      FILTER(
          
          , 1=1 - Dummy Filter
          )
RETURN
   >

The second form is shorter, and everything is contained in one variable.

I encourage you to add comments to explain why you're adding a FILTER() to the Measure with a FILTER 1=1.

This would make no sense to someone who doesn't know this technique.

References

If you don't know how to collect and interpret performance metrics in DAX Studio or are unsure about the interpretation of the Metrics shown there, read this article, where I dig into this feature:

How to get performance data from Power BI with DAX Studio

I use the Contoso sample dataset, like in my previous articles. You can download the ContosoRetailDW Dataset for free from Microsoft here.

The Contoso Data can be freely used under the MIT License, as described here.

I enlarged the dataset to make the DAX engine work harder. The Online Sales table contains 71 million rows (instead of 12.6 million rows), and the Retail Sales table contains 18.5 million rows (instead of 3.4 million rows).

If you appreciate my work, feel free to support me through

Salvatore Cagliari

Or scan this QR Code:

Any support is greatly appreciated.

Thank you.

Tags: Data Analysis Dax Power Bi Power Bi Tutorials Reporting

Comment