Just joining the blog? Explore prior lessons here.
A PDF file isn't built for complex data analysis, but can hold lots of data we want to analyze. In this exercise, I will show how easy it is to convert a PDF to a data visualization using PowerBI.
The Scenario
You can follow along with this exercise by downloading the source PDF file saved here: PDF.
Let's assume you work in compliance for a large accounting firm in New Mexico. Each quarter, one of your tasks is to go through the NM Accountancy Board Newsletter and pull a list of all people with Disciplinary Reports.
Flipping through the newsletter, you will find the data for this exercise on Page 11.
Choose Data
Haven't downloaded the PowerBI Desktop app yet? Instructions available in this prior post under the header, Install.
Open the application PowerBI Desktop, and let's get started.
Start in the Home menu, then select Get data, scrolling down to the bottom to select More... The More option expands to show more than 100 connections that are already built-in to PowerBI.
In the Open menu pop-up, browse to the location where the downloaded PDF has been saved.
The option Table025 (Page 11) shows the data that we are after, without the extra text from the top of page 11. So select the option Table025 (Page 11), then Load.
Create Visualization
Similar to using Excel's Pivot Table Fields List, in Power BI there are a list of Fields to choose from based on the Data that was loaded above. In this example, since we didn't clean the data before using it, the Column names are generic: Column 5 and Column 6. You can then drag and drop fields in to the Visualizations area, immediately below the area where you select the Matrix option.
After adding the field Column 6 to the Rows area, the Matrix looks like the following, which lists some of the names from one of the columns of the Table we loaded.
While I hope that most of my readers have moved completely away from PDFs, on those occasions where PDFs can't be avoided, I hope this exercise helps transform a manual process to something more automated.
In the next PDF lesson, we'll continue working with this PDF, but explore how to get both columns with names in to the Matrix, using transformation tools in the PowerQuery Editor.
In this lesson, you learned:
- How to load data to PowerBI;
- How to convert a PDF to a Visualization in PowerBI; and
- What a Matrix Visualization is and how to build it.
Prior post: Combine All Files in a Folder
Next post: Transforming a PDF
No comments:
Post a Comment