Calculated columns vs Measures


Just joining the blog? Explore prior lessons here.

When playing with the Power tools, there is often more than one way to tackle a task. Read further to explore the pros and cons of calculated columns and Measures. 

For this exercise, we'll use the dataset below to find Total Sales for each product using calculated columns and then a Measure. 
       
                                    ⇧


Calculated Column

A calculated column expands a dataset by adding an additional column. It's similar to native Excel where you would write a formula in a new column.

New users will feel more comfortable using calculated columns instead of Measures. However, there are some downsides to this approach which we'll explore further down. For now though, let's practice our Power Query skills by adding a calculated column.

Select the Download icon above the arrow ⇧ on the dataset above.  With your cursor in the Table, load to Power Query by going to the Data tab and selecting From Table/Range.  

Inside the Power Query Editor, on the Add Column tab, select Custom Column. 

Create a "Total sales" column like the example below.

After selecting OK, the new Calculated Column has been added to our query.









Change the type of "Total sales" to Decimal Number and then Close & Load To, Table, New worksheet, Add this data to the Data Model, Load.

Advantages of using Power Query instead of native Excel to build the Calculated Column include:
  • Removes the potential for a formula to get altered accidentally;
  • Shrinks the file size, since Power Query uses xVelocity to compress data; and,
  • Allows processing of massive amounts of data since Power Query is limited only by the capacity of your machine, not Excel's ~1 million rows limit: 20 million rows, no problem.
However, it could be even better. Let's try it in Power Pivot.


Build a Power Pivot Pivot Table

Before we write a Measure, let's add a Pivot Table to our workbook, so we can see the results. 

Choose to "Use this workbook's Data Model".  This creates a Pivot Table using the data loaded in to Power Query.

In the PivotTable Fields, there may be 3 Sales_data sets to choose from when selecting fields for the Pivot.  

Hovering over the name yields more information to distinguish between the data sources.
  • The first is the original Data Table which was loaded to Power Query.
  • The second is the query that was created in Power Query.
  • The third is the output of the query.
On the PivotTable Field List, select from the query "Sales_data 1" and pull Product in to the Rows area and "Total sales" in to the Values area.


Measure

Now, let's calculate Total Sales again, but this time using a Measure instead of a calculated column. The results will be the same, but Measures have additional advantages over native Excel and calculated columns.

The Measure will need to multiply Qty and Unit Price for each row of the data and then sum those products to get a grand total.  To achieve this result, we will use a SUMX formula, which is an iterator formula, which iterates through each row of the Table and then sums the total.

The SUMX format is 
    = SUMX(Source Query, function to perform on each row before summing)
In our case, this roughly translates to
    =SUMX(Sales_data 1, Qty*Unit Price)

Let's write the Measure. 

On the Power Pivot menu, select Measures, New Measure.

  •  Name the Measure, "Total sales Measure". 
  • Start the formula with an equal sign.
  • SUMX first needs a table to reference, so type an open parenthesis, then 'Sales_data 1' followed by a comma.
  • Then, start the expression by typing Qty.  Select 'Sales_data 1'[Qty], which is the quantity column from the query. 
  • Add a multiplication sign.
  • Type "price" and then select 'Sales_data 1'[Unit Price].
  • Close parenthesis.
Check the formula and assign the value a category.  Here's the solution.
=SUMX('Sales_data 1','Sales_data 1'[Qty]*'Sales_data 1'[Unit Price])
Return to your Pivot Table, and add in the new Measure.

Advantages of using Power Pivot Measures instead of native Excel or Power Query to build the Total Sales calculation include:
  • All the advantages listed above for calculated columns versus native Excel plus the following.
  • Measures use less memory than calculated columns.
  • The Power Pivot Data Model compresses data 7 to 10 times smaller than the same data in native Excel.  
    • Native Excel uses the in-memory analytics engine to store data in memory (hence why large files with complex formulas or multiple Pivots receive resource error messages).
  • Measures can be reused in multiple Pivot Tables in the same workbook.
  • Measures can be referred to in other Measures. For example, if we had to add sales tax of 10%, a Measure could be written to take =Total Sales Measure * 10%.
MeasureNow, let's calculate Total Sales again, but this time using a Measure instead of a calculated column. The results will be the same, but Measures have additional advantages over native Excel and calculated columns.

The Measure will need to multiply Qty and Unit Price for each row of the data and then sum those products to get a grand total.  To achieve this result, we will use a SUMX formula, which is an iterator formula, which iterates through each row of the Table and then sums the total.

The SUMX format is 
    = SUMX(Source Query, function to perform on each row before summing)
In our case, this roughly translates to
    =SUMX(Sales_data 1, Qty*Unit Price)

Let's write the Measure. 

On the Power Pivot menu, select Measures, New Measure.

  • Name the Measure, "Total sales Measure".
  • Start the formula with an equal sign.
  • SUMX first needs a table to reference, so type an open parenthesis, then 'Sales_data 1' followed by a comma.
  • Then, start the expression by typing Qty. Select 'Sales_data 1'[Qty], which is the quantity column from the query.
  • Add a multiplication sign.
  • Type "price" and then select 'Sales_data 1'[Unit Price].
  • Close parenthesis.
Check the formula and assign the value a category.  Here's the solution.
=SUMX('Sales_data 1','Sales_data 1'[Qty]*'Sales_data 1'[Unit Price])
Return to your Pivot Table, and add in the new Measure.

Advantages of using Power Pivot Measures instead of native Excel or Power Query to build the Total Sales calculation include:
  • All the advantages listed above for calculated columns versus native Excel plus the following.
  • Measures use less memory than calculated columns.
  • The Power Pivot Data Model compresses data 7 to 10 times smaller than the same data in native Excel.  
    • Native Excel uses the in-memory analytics engine to store data in memory (hence why large files with complex formulas or multiple Pivots receive resource error messages).
  • Measures can be reused in multiple Pivot Tables in the same workbook.
  • Measures can be referred to in other Measures. For example, if we had to add sales tax of 10%, a Measure could be written to take =Total Sales Measure * 10%.

In this lesson, you learned:
  • Calculated columns and Measures can return the same results;
  • The difference between creating a calculated column and a Measure;
  • The advantages of creating a Power Query calculated column instead of using native Excel; and,
  • The advantages of creating Measures instead of using native Excel or Power Query.

SOLUTION 


Prior post: The Power of the Unpivot
Next post: M-language

No comments:

Post a Comment