Convert PDF using PowerBI

Just joining the blog? Explore prior lessons here

A PDF file isn't built for complex data analysis, but can hold lots of data we want to analyze. In this exercise, I will show how easy it is to convert a PDF to a data visualization using PowerBI. 


The Scenario

You can follow along with this exercise by downloading the source PDF file saved here: PDF.

Let's assume you work in compliance for a large accounting firm in New Mexico. Each quarter, one of your tasks is to go through the NM Accountancy Board Newsletter and pull a list of all people with Disciplinary Reports. 

Flipping through the newsletter, you will find the data for this exercise on Page 11. 


Choose Data

Haven't downloaded the PowerBI Desktop app yet? Instructions available in this prior post under the header, Install.

Open the application PowerBI Desktop, and let's get started. 

Start in the Home menu, then select Get data, scrolling down to the bottom to select More... The More option expands to show more than 100 connections that are already built-in to PowerBI. 

At the time of this writing, the PDF option was listed at the top of the All section. If you don't see it quickly, use the Search functionality in the same menu.

Once PDF is selected, then choose Connect. 
In the Open menu pop-up, browse to the location where the downloaded PDF has been saved. 

Next, the Navigator menu will open, showing a few different options for selecting the data from page 11. Below I've selected the three Page 11 options to highlight them, but we only need to select one to load. To decide which option to select, click on each Page 11 to view the preview.






The option Table025 (Page 11) shows the data that we are after, without the extra text from the top of page 11. So select the option Table025 (Page 11), then Load.












Create Visualization



Now that the data has been loaded, it's time to transform it in to a Visualization. 
 
In Power BI, the Matrix Visualization is a standard menu option, and it resembles a Pivot Table. It's an easy first Visualization type for accountants learning PowerBI, since it aligns with the pivots many of us are so familiar with.


Similar to using Excel's Pivot Table Fields List, in Power BI there are a list of Fields to choose from based on the Data that was loaded above. In this example, since we didn't clean the data before using it, the Column names are generic: Column 5 and Column 6. You can then drag and drop fields in to the Visualizations area, immediately below the area where you select the Matrix option.

After adding the field Column 6 to the Rows area, the Matrix looks like the following, which lists some of the names from one of the columns of the Table we loaded. 











While I hope that most of my readers have moved completely away from PDFs, on those occasions where PDFs can't be avoided, I hope this exercise helps transform a manual process to something more automated.  

In the next PDF lesson, we'll continue working with this PDF, but explore how to get both columns with names in to the Matrix, using transformation tools in the PowerQuery Editor.

In this lesson, you learned:
  • How to load data to PowerBI;
  • How to convert a PDF to a Visualization in PowerBI; and
  • What a Matrix Visualization is and how to build it.





No comments:

Post a Comment