Error checking

Just joining? Explore prior lessons here.

If you've played around with Power Query a bit, you may have experienced query errors, where loading a query shows a message such as "4 errors". This post shows an easy way to proactively check for errors before loading.

The Data

For this exercise, we will use a simple data set of Actual Sales which contains two columns: Date & Sales. Download the file below, or follow along with a similar workbook of your own.

                                   ↑

The Problem

If you look at the dataset, the yellow highlight is a date error, since Excel isn't ready yet for year 20122. 
When the data is loaded in to a Pivot Table, the Pivot isn't going to treat it as a date as shown below.
Similarly when the same data is loaded in to Power Query, and the column is tagged as Data Type "Date", then Power Query is going to treat it as an error. If you don't catch it at the onset, when the query is loaded, you'll note an error message similar to below.


The Solution

Identifying the errors proactively, before loading a query, is ideal. To do so, we need to modify the View. Inside the Power Query Editor, on the View tab, there are two options which will provide the error information we seek: "Column quality" and "Column profile".
Column Quality provides three quick facts at the top of each column: they are the percent of data which is Valid, an Error, and Empty. By focusing on the Error metric, it is easy to identify that this query has errors in columns "Date of Sale" and "Start of Month". 
Then you can focus on those columns and identify where in the column the error is occurring. This leads to one of the following decisions which will need to be made:
  1. Accept the error, knowing the data won't present appropriately in a Pivot, the Data Model, or a Measure.
  2. Remove the errors using Power Query by right clicking on the column header and selecting "Remove Errors", but this impacts data quality and completeness.
  3. Fix the error at the source. Now that you know where the error is, it will be easier to go back to the source and fix the problem. When you are dealing with very large data sets, finding the problem is half the battle.
  4. Fix the error using Power Query. Using transformation tools, such as Replace Values, the error can be corrected before the Changed Type step. 

Column Profile

In addition to the Column Quality option, another View which was enabled above was the Column Profile. Where Column Quality stays on for each column, Column Profile only displays information when an individual column header is selected.

When selecting the Date of Sale column, on the left side are statistics about the selected column including row count and number of errors. However, since there is an error in this column, Column Profile doesn't provide any additional information.









Let's contrast this with the Column Profile of a column without errors, such as the Sales column.

When a column has no errors, the Column Profile view is able to provide more information as shown above. The left side has more statistical data such as the number of empty rows, number of distinct or unique values, or the average of all values. On the right side, there's a distribution graph showing the frequency of each value. 

One of these two views is sure to be helpful in your Power Query journey.

In this lesson, you learned how to:
  • Find errors using Column Quality and Column Profile; and,
  • Address errors using four alternative approaches.




No comments:

Post a Comment