Hire a PowerBI guru

I've recently hired a PowerBI analyst to join my team in my full-time industry job. In 3 short months, he's built 3 amazing PowerBI dashboards that have given us tremendous insight in to our accounting data such as the composition of our Journal Entries, our fluctuation (flux) explanations, and our Month End Close Status. It's been a tremendous win.

He started his career as a data engineer, and then taught himself PowerBI from materials like this available online. He's already been awarded our company's high honor quarterly CFO award: the first time someone on my team has won in the last three years. I've been trying to teach accountants how to use these Power tools, but hiring someone who has already made the effort, has been a huge win. I can't wait until the market starts to have more people like him available as they are a key part of my accounting strategy for winning. 

So my key takeaway, hire a PowerBI guru if you can. Make the case; they will quickly pay for themselves with their insights (and make you look like a rockstar)!

And if you are the guru already, send me a message, and let's talk. Maybe I have a project or work that you can help me with. 

How Do You Learn Best?

Since building this blog, I've wondered what's the best way to get this information out to the masses of accountants and finance professionals who could really benefit from the automation that PowerBI, Power Query and Power Pivot provide.

In the beginning I tinkered with making some videos and doing live presentations, but I got a lot of feedback that people needed step-by-step instructions and use cases so they could play with the tools and make them their own. However, I wonder what's the best way for you to learn?

Leave me your vote in the comments.
A. Videos posted on this blog, YouTube, or even TikTok (yikes)
B. Live instruction
C. This blog 

Prior post: Transforming a PDF
Next post: Coming soon

Transforming a PDF

For this next exercise, we will build our PowerBI skills further, automating cleaning up the data within the PowerBI application. The post is a continuation of the work that was started in the post, Convert PDF Using PowerBI. In the original exercise, we took a PDF file and converted it to a visualization using PowerBI; now, we will perform transformation work to the PDF file before reloading it to the data visualization. 


The Scenario

In this series of exercises, the goal is to pull a list of all names from a PDF. In the first exercise, we just loaded the data as is; however, if you compare the source PDF file to the final visualization, many of the names are missing, with the PowerBI visualization matrix only displaying the names that start from the PDF section Randy Travis.

To begin the transformation process, let's start where we left off in the prior PDF lesson: with the Solution file. Open the Solution PowerBI file (or your prior exercise workbook), and then navigate to the PowerQuery Editor within PowerBI by selecting the query Table025 (Page 11). Right click and select "Edit Query".


Process the data

You've now entered the Power Query Editor: the same one used for Power Query in Excel. 

Looking at the data, there's a few things we need to address to get to our goal: a list of all names.
  • Column2 has a rouge name (Nicole Lopez).
  • Column1 includes characters (i.e, 2019-10) preceding the name.
  • Columns 1, 2 & 6 each contain names, but we them all in a single list..
Let's tackle these clean-up items one-by-one. 

Addressing the rouge Name in Column2

To fix the name hanging out by itself in Column2, we will combine Column1 and Column2, by adding a Conditional Column. The Conditional Column will be called "1st column" and will look at Column2 and if Column2 has a value of null, then populate [1st column] with the value from Column1. If not null, then it will populate the value from Column2. This is very similar to the manual Excel process of creating a new column and then using an IF statement to populate each row. 
ln the PowerBI Power Query Editor ribbon, select Add Column, then Conditional Column,
When the "Add Conditional Column" menu opens, populate the fields as shown in the image below. Remember Power Query is case sensitive so "null" must be lowercase. 


After selecting "OK", there will be an extra column named [1st column]. By default, Power Query always adds new columns to the far right of your dataset. 

Now, we've got the names from Column1 and Column2 in one column. However, we also have extra data (i.e., 2019-10) in the column that we need to remove. Time for the second cleanup exercise.


Remove digits preceding names

To isolate the name, we need to remove the digits that precede them (i.e., 2019-10). We will use something similar to Excel's Text to Columns: Split Column by Delimiter. 

Click on the header for the new column, [1st column], highlighting the entire column. In the ribbon, on the Home menu, select the Split Column option, then By Delimiter.








In the Split Column by Delimiter menu, select Space from the drop down menu, and then Left-most delimiter. This will split [1st column] in to two columns based on the first space it finds, reading left to right. {I love the extra flexibility Split Column offers, such as Left-most, compared to the similar Excel version!}
In the new column [1st column.2], the first few rows look as expected and appear to have achieved the objective. However, as you scroll further down, you can see that "Nicole" is separated from her last name "Lopez". This happened because "Nicole Lopez" didn't have a space in front of it, and so the Left-most Space used to split the column was the space between "Nicole" and "Lopez". There are several approaches we could take to address this issue, but for this exercise we will focus on modifying an existing formula.

Modify Formula for Conditional Column

When we created [1st column], we used the Conditional Column menu to create an if statement that returned Column2's value when Column2 had a value. But, if instead of returning just Column2's value, we returned Column1 + Column2, then the data in Column2 will look the same as Column1. Since the columns look the same after the revision, the Split Column by Delimiter step will work the same, resulting in the correct result for the name of "Nicole Lopez".





If you haven't turned on your formula bar, go to the View menu on the ribbon and check the box "Formula bar". This is a one time set-up.

Now in the Applied Steps area of the Power Query Editor, select the step named "Added Conditional Column". 

Selecting an Applied Step changes the Preview to show the data how it looked after that step was performed, but before the subsequent steps. It also updates the formula bar to show the M language formula used to create that step. 



Good news is that you don't have to learn the M language in order to make some easy changes using it. In this case, we are going to modify the portion referring to the results of the new Conditional Column. In the formula above, it says if [Column2] is not null then [Column2]. We'll modify the very end of the formula to say "then [Column1]&[Column2]". You can make that change by directly typing in the formula bar above.

An ampersand is used in Power Query to indicate "and", or to join two words together. After pasting in the "[Column1]&" in the formula bar, it should read as follows. Now if we look at the [1st column], all columns have a "2019-##" and a name. 










In the [1st column] there is one last issue left to clean-up: there is a space missing on Row 11. Similar to above, we are going to manually type in the instructions to the formula bar. To add text to a column in Power Query, surround texts by quotation marks. Adding a space then is, " "; quotation marks with a space between them. The final modified formula looks like this, and the person listed on Row 11, aligns with the other rows. Even better, you've created the rules once, and if this ever occurs again in the future, the directions are written, and the process will be lightning fast.





Remove other columns
Power Query works faster when you remove any unnecessary columns. Since for this exercise all we need is a final list of names, we can tell Power Query to remove all other columns, leaving only the [1st column.2]. Do so by selecting [1st column.2], right clicking and selecting the option "Remove Other Columns". 
Combine columns within the same query

We now have a column that shows all the cleaned-up names from Column1, but we need to add to it the names in Column6. The good news is that the Column6 names are already cleaned up, so no work is required there, 

For this trick, I'm going to show you how to append a Table within the same query, combining the new [1st column] with Column6; in plain English, I'm going to show you how to get the new Column1 and Column6 in to a single name column.

Let's start by renaming [1st column.2] to "Column6". This aligns the name of the two columns that contain names: the column we cleaned up and the original column6. The append functions combines columns together based on their headers, so ensure it's spelled exactly the same way including capitalization. Rename a column by double clicking the header. 



Next, we will use the append function. In the ribbon, under Home, you'll find Append on the far right. Select the first option, Append Queries. Append makes a current list longer, by lining up data from multiple columns with the same headers.








In the pop-up Append menu, in the Table to append dropdown, select the current query, Table025 (Page 11). This option results in the query appending to itself. Notice in the preview area how the list of names restarts on row 17.
What we need to do next is to change the Append formula manually to look at a version of the query where the original Colunm6 is still included, and then Append that version of the Table to the version that has Column1 renamed as Column6. Essentially, next we need to combine both lists of names from the original columns 1 and 6. Instead of writing the formula from scratch, we used the Power Query menu options to do the hard work, and we will tweak the formula for this scenario. I totally geek out on tricks like these, and how much they expand your capabilities in Power Query, with minimal investment. 

If you select on the Applied Step for Appended Query, it will show the formula for that step in the formula bar. Reading the formula from left to right it says "Combine Tables that are located at the Applied Step Renamed Columns with the Table at the same Applied Step Renamed Columns". 




To modify the formula, we will replace one of the Renamed Columns references with a reference to an Applied Step that contained Column6. We got rid of Column6 when we performed the step Removed Other Columns; so a Step where Column6 was still in the Table was the Changed Type1 Step. 
So while in the Appended Query Applied Step, we are going to replace one of the references to Renamed Columns with the Changed Type1 step. Remember that Power Query is case sensitive, so be sure to spell exactly as shown. The # proceeding the reference indicates that you are referring to a Step in the query, and if a Step is more than one word, quotation marks should surround it. Here's how the final modified formula will look.





Alternatively, you also could have swapped the order and placed Changed Type1 first and Renamed Columns last.

Now that you've manually updated the formula, you'll notice two changes: 1) it's pulled back in all the rows that you removed previously, and 2) Column6 now includes names from both columns of the original table. 

Repeat the step to Remove Other Columns, keeping only Column6, and you have a single clean list of names, converted from a PDF to a Power BI query.




Next post: Coming soon


Direct query

Today, November 7, 2022, I saw that Microsoft has finally launched Direct Query, which allows users to connect to an existing PowerBI dataset. I've been waiting for this feature since I started using PowerBI and PowerQuery in 2018.

It does appear to be in Beta, and so I did have to enable the ability to Preview this feature, but I got prompted as soon as I selected the option to connect to PowerBI Datasets, and the directions were easy to follow.

Since it's brand new, I'm obviously still playing around with it, but was glad to see that my favorite datasets were easy to connect to once the option was selected. I'm really hoping this functionality improves some refresh times. Instead of reconnecting to a source and performing the necessary transformations, I'm optimistic that I'll be able connect directly to the transformed dataset going forward.


Instructions

Ready to get started? Here's a link to Microsoft's instructions for Connecting PowerBI reports to existing PowerBI datasets.


Here's the link to Microsoft's instructions for Connecting Excel to existing PowerBI datasets. 




Have you tried Direct Query? Please leave comments on your experience below. 


Convert PDF using PowerBI

Just joining the blog? Explore prior lessons here

A PDF file isn't built for complex data analysis, but can hold lots of data we want to analyze. In this exercise, I will show how easy it is to convert a PDF to a data visualization using PowerBI. 


The Scenario

You can follow along with this exercise by downloading the source PDF file saved here: PDF.

Let's assume you work in compliance for a large accounting firm in New Mexico. Each quarter, one of your tasks is to go through the NM Accountancy Board Newsletter and pull a list of all people with Disciplinary Reports. 

Flipping through the newsletter, you will find the data for this exercise on Page 11. 


Choose Data

Haven't downloaded the PowerBI Desktop app yet? Instructions available in this prior post under the header, Install.

Open the application PowerBI Desktop, and let's get started. 

Start in the Home menu, then select Get data, scrolling down to the bottom to select More... The More option expands to show more than 100 connections that are already built-in to PowerBI. 

At the time of this writing, the PDF option was listed at the top of the All section. If you don't see it quickly, use the Search functionality in the same menu.

Once PDF is selected, then choose Connect. 
In the Open menu pop-up, browse to the location where the downloaded PDF has been saved. 

Next, the Navigator menu will open, showing a few different options for selecting the data from page 11. Below I've selected the three Page 11 options to highlight them, but we only need to select one to load. To decide which option to select, click on each Page 11 to view the preview.






The option Table025 (Page 11) shows the data that we are after, without the extra text from the top of page 11. So select the option Table025 (Page 11), then Load.












Create Visualization



Now that the data has been loaded, it's time to transform it in to a Visualization. 
 
In Power BI, the Matrix Visualization is a standard menu option, and it resembles a Pivot Table. It's an easy first Visualization type for accountants learning PowerBI, since it aligns with the pivots many of us are so familiar with.


Similar to using Excel's Pivot Table Fields List, in Power BI there are a list of Fields to choose from based on the Data that was loaded above. In this example, since we didn't clean the data before using it, the Column names are generic: Column 5 and Column 6. You can then drag and drop fields in to the Visualizations area, immediately below the area where you select the Matrix option.

After adding the field Column 6 to the Rows area, the Matrix looks like the following, which lists some of the names from one of the columns of the Table we loaded. 











While I hope that most of my readers have moved completely away from PDFs, on those occasions where PDFs can't be avoided, I hope this exercise helps transform a manual process to something more automated.  

In the next PDF lesson, we'll continue working with this PDF, but explore how to get both columns with names in to the Matrix, using transformation tools in the PowerQuery Editor.

In this lesson, you learned:
  • How to load data to PowerBI;
  • How to convert a PDF to a Visualization in PowerBI; and
  • What a Matrix Visualization is and how to build it.





Combine all files in a Folder

Just joining the blog? Explore prior lessons here.
 
Imagine a folder with 100 journal entry files, and in less than in a minute you can combine them all together in to one report. Power Query makes it very easy to take all the files in a folder and combine the files together to create a single query. 
The biggest challenge is how to modify the query when your report changes. Let's walk through all steps, so this will also serve as a reference guide for when an inevitable change occurs in the future.

File types

For this illustration, we are going to use Trial Balances saved as Excel files all in a single folder. Reports which you frequently export from an ERP system and then manipulate for analysis are excellent files to use when trying this out initially.

While you can combine any files, it works best when:
  • The layout of the files are similar such as consistent column headers and number of columns.
  • The files are all of the same file type (i.e, .xlsx, .csv, .txt); and
  • The file layout rarely changes.
If you want to follow along, download these Trial Balances and save in a folder on your network.
                   ðŸ’¾↑                              ↑💾

Get Data

Let's build a query using the folder where the above downloaded files are saved. First step is to load the data. On the Excel ribbon, go to Data, Get Data, From File, From Folder. Browse to the folder where the files you downloaded above are saved.


Once selected, the navigator screen will display a list of all the files in the folder. Select Combine, then Combine & Transform Data.
Next, Power Query wants to know which file in the folder should be used as the template when combining the files. By default, the Sample File will be the First File in the folder. You can also select the option to exclude any files which have errors.



After selecting OK, Power Query has now taken all the files in the folder and combined them together.












The Helper Queries

When a folder is the source for a query, Power Query creates several Helper Queries.
    Parameter1: tells Power Query which file needs to be used as the example. 
    Sample File: grabs the file to be used as the model, or example.
    Transform File: provides instructions on how to convert the Sample to rows & columns.
    Transform Sample File: converts the Sample File in to a query with rows & columns.
    Now let's look at each of these queries to understand more about how they each perform and where challenges may be hidden.
The Parameter Query tells Power Query which file needs to be used as the example. This will always be the "Sample File"; I can't think of a single issue I've ever encountered with the Parameter Query. If something breaks, this is not likely your culprit.









The Sample File Query grabs the file which was chosen to be the example. 











The challenge which I've encountered in the past is when the first file in the folder is no longer representative of what the data should look like, such as when the source updates the file layout (frequently I see this issue with third party files). In this case, filter the Source to select the correct file to use as the template for combining the data.


The Transform File query is actually a custom function (note the fx next to the query name). When the function is ran (called invoked in Power Query), all the steps are performed as defined in the query.










Applied Steps aren't visible in a function query; instead, to view what the query is doing you'll need to select the Advanced Editor from the Home menu to view the M language behind the query. In brief, the code below is saying, transform the Sample File (Parameter1) from the file format in to the Excel Workbook format of a query, which includes columns an rows, using the data from the worksheet Sheet1. When the function is run it creates the query Transform Sample File.














The Transform File query is where you are most likely to encounter an issue. Note above that when the function runs it is looking for a worksheet named "Sheet1". If the Sample file has a different sheet name, the query will fail. To correct, simply open the Advanced Editor and replace "Sheet1" with the appropriate worksheet name from the Sample File.

A similar type of issue can occur when the files in the folder are .csv. Below is what the M language looks like when the files are csv. Notice the "Columns=5" portion of the code; this means that only the first 5 columns of the csv file will be pulled in to Power Query when the function is invoked. If your file format changes and now has 6 columns, you can simply open the Advanced Editor and change "Columns=5" to "Columns=6". This is the most frequent issue I've had with my queries from folders. Bookmarking this tip will save you time in the future.

💥BEST PRACTICE TIP
Quickly fix From File from Folder queries by making quick alterations to the queries in the Advanced Editor view. In the Transform File function query, modify the M language for Excel Files by updating the worksheet name, or for csv files update the number of columns.💥

The final Helper Query is the Transform Sample File query. This query is generated when the function query, Transform File, is invoked. It brings together all the helper queries to convert the sample in to an Excel workbook type of query.





To sum up what we've discussed:, the Helper Queries are:
    Parameter1: tells Power Query which file needs to be used as the example. 
    Sample File: grabs the file to be used as the model, or example.
    Transform File: provides instructions on how to convert the Sample to rows & columns.
    Transform Sample File: converts the Sample File in to a query with rows & columns.


The Converted File

After all the Helper Queries are created, then the final query is created. This query is the end result we were trying to achieve. In this query, you'll perform the normal Power Query transformations such as data clean-up and modifying columns and headers, achieving an end result similar to below, which combines trial balances from multiple dates.






First note that all the Applied Steps in the screenshot below were automatically created by Power Query. The first step is the source, which is the folder file. Then, Power Query filters for any hidden files. Third, it invokes the function query, Transform File, transforming all the files to Excel workbook query types adding a column called Transform File. It then formats the file like the Sample File by renaming the Header Name to Source.File and removing extra columns. Finally, it expands the transformed data and changes the types.



Power Query's ability to Combine all files in a folder is revolutionary. It's allowed me to personally create queries and analyze data in ways I never imagined before learning about it. While extremely easy to implement, when it fails, the existence of 4 Helper Queries plus the final query, can make it hard to identify where things go wrong. 

In this lesson, you learned:
  • How to Get Data using From File From Folder;
  • What each Helper Query does; and
  • How to troubleshoot common issues.


Prior post: Query Dependencies
Next post: Direct Query