Remove duplicates


New to this blog? Start the Intro to Power Query series here.

A blog reader was using Power Query to remove duplicates and noticed duplicates were still in the dataset after the step was applied.  What happened?


Power Query is case sensitive"Data" and "data" are not considered duplicates.


Before removing duplicates, consider applying the following 3 steps to cleanse your data:
  1. standardize the case (lower, upper, or capitalize each word)
  2. apply TRIM (removes leading and trailing spaces)
  3. apply CLEAN (removes non-printable characters)
All three of these steps can be found on the Transform tab, under Format. 


When applied to columns that are not of the data type "text", the column data type will be changed to text.  Since data type matters for some functions, be punctilious when selecting columns to transform.

In this lesson, you learned how to:
  • Remove duplicates accurately in Power Query.

Prior post: Choose Your Own Data Adventure
Next post: Joins Conceptually


💬  Have issues?  Let me know, so we can troubleshoot together.

No comments:

Post a Comment