Combine all files in a Folder

Just joining the blog? Explore prior lessons here.
 
Imagine a folder with 100 journal entry files, and in less than in a minute you can combine them all together in to one report. Power Query makes it very easy to take all the files in a folder and combine the files together to create a single query. 
The biggest challenge is how to modify the query when your report changes. Let's walk through all steps, so this will also serve as a reference guide for when an inevitable change occurs in the future.

File types

For this illustration, we are going to use Trial Balances saved as Excel files all in a single folder. Reports which you frequently export from an ERP system and then manipulate for analysis are excellent files to use when trying this out initially.

While you can combine any files, it works best when:
  • The layout of the files are similar such as consistent column headers and number of columns.
  • The files are all of the same file type (i.e, .xlsx, .csv, .txt); and
  • The file layout rarely changes.
If you want to follow along, download these Trial Balances and save in a folder on your network.
                   ðŸ’¾↑                              ↑💾

Get Data

Let's build a query using the folder where the above downloaded files are saved. First step is to load the data. On the Excel ribbon, go to Data, Get Data, From File, From Folder. Browse to the folder where the files you downloaded above are saved.


Once selected, the navigator screen will display a list of all the files in the folder. Select Combine, then Combine & Transform Data.
Next, Power Query wants to know which file in the folder should be used as the template when combining the files. By default, the Sample File will be the First File in the folder. You can also select the option to exclude any files which have errors.



After selecting OK, Power Query has now taken all the files in the folder and combined them together.












The Helper Queries

When a folder is the source for a query, Power Query creates several Helper Queries.
    Parameter1: tells Power Query which file needs to be used as the example. 
    Sample File: grabs the file to be used as the model, or example.
    Transform File: provides instructions on how to convert the Sample to rows & columns.
    Transform Sample File: converts the Sample File in to a query with rows & columns.
    Now let's look at each of these queries to understand more about how they each perform and where challenges may be hidden.
The Parameter Query tells Power Query which file needs to be used as the example. This will always be the "Sample File"; I can't think of a single issue I've ever encountered with the Parameter Query. If something breaks, this is not likely your culprit.









The Sample File Query grabs the file which was chosen to be the example. 











The challenge which I've encountered in the past is when the first file in the folder is no longer representative of what the data should look like, such as when the source updates the file layout (frequently I see this issue with third party files). In this case, filter the Source to select the correct file to use as the template for combining the data.


The Transform File query is actually a custom function (note the fx next to the query name). When the function is ran (called invoked in Power Query), all the steps are performed as defined in the query.










Applied Steps aren't visible in a function query; instead, to view what the query is doing you'll need to select the Advanced Editor from the Home menu to view the M language behind the query. In brief, the code below is saying, transform the Sample File (Parameter1) from the file format in to the Excel Workbook format of a query, which includes columns an rows, using the data from the worksheet Sheet1. When the function is run it creates the query Transform Sample File.














The Transform File query is where you are most likely to encounter an issue. Note above that when the function runs it is looking for a worksheet named "Sheet1". If the Sample file has a different sheet name, the query will fail. To correct, simply open the Advanced Editor and replace "Sheet1" with the appropriate worksheet name from the Sample File.

A similar type of issue can occur when the files in the folder are .csv. Below is what the M language looks like when the files are csv. Notice the "Columns=5" portion of the code; this means that only the first 5 columns of the csv file will be pulled in to Power Query when the function is invoked. If your file format changes and now has 6 columns, you can simply open the Advanced Editor and change "Columns=5" to "Columns=6". This is the most frequent issue I've had with my queries from folders. Bookmarking this tip will save you time in the future.

💥BEST PRACTICE TIP
Quickly fix From File from Folder queries by making quick alterations to the queries in the Advanced Editor view. In the Transform File function query, modify the M language for Excel Files by updating the worksheet name, or for csv files update the number of columns.💥

The final Helper Query is the Transform Sample File query. This query is generated when the function query, Transform File, is invoked. It brings together all the helper queries to convert the sample in to an Excel workbook type of query.





To sum up what we've discussed:, the Helper Queries are:
    Parameter1: tells Power Query which file needs to be used as the example. 
    Sample File: grabs the file to be used as the model, or example.
    Transform File: provides instructions on how to convert the Sample to rows & columns.
    Transform Sample File: converts the Sample File in to a query with rows & columns.


The Converted File

After all the Helper Queries are created, then the final query is created. This query is the end result we were trying to achieve. In this query, you'll perform the normal Power Query transformations such as data clean-up and modifying columns and headers, achieving an end result similar to below, which combines trial balances from multiple dates.






First note that all the Applied Steps in the screenshot below were automatically created by Power Query. The first step is the source, which is the folder file. Then, Power Query filters for any hidden files. Third, it invokes the function query, Transform File, transforming all the files to Excel workbook query types adding a column called Transform File. It then formats the file like the Sample File by renaming the Header Name to Source.File and removing extra columns. Finally, it expands the transformed data and changes the types.



Power Query's ability to Combine all files in a folder is revolutionary. It's allowed me to personally create queries and analyze data in ways I never imagined before learning about it. While extremely easy to implement, when it fails, the existence of 4 Helper Queries plus the final query, can make it hard to identify where things go wrong. 

In this lesson, you learned:
  • How to Get Data using From File From Folder;
  • What each Helper Query does; and
  • How to troubleshoot common issues.


Prior post: Query Dependencies
Next post: Direct Query

1 comment: