Python vs Excel for Data Cleaning

Mynda Treacy

March 18, 2025

Data cleaning is one of the most time-consuming tasks for analysts. While Python is a powerful tool, it’s not always the most efficient choice for cleaning data. In this post, we’ll show you how Power Query in Excel can clean the same dataset 12x faster—with zero coding that’s 100% reusable.

The Challenge: Cleaning a Complex Survey Dataset

Shashank Kalanithi’s popular 1hr 17min YouTube tutorial that has been viewed over 3.5 million times demonstrates how to clean SurveyMonkey data using Python and Pandas. And while his method is correct, it’s a lot of time-consuming coding and debugging.

Using Excel’s Power Query, we can achieve the same transformation in just 5 minutes with zero coding, and it’s reusable, so next time it’ll be even faster!

how to clean data using Python?

Shashank Kalanithi Data Cleaning with Python Video.


Watch the Power Query vs Python for Data Cleaning Video

Subscribe YouTube

 

Get the Practice Data and Power Query Solution

Enter your email address below to download the sample workbook.

By submitting your email address you agree that we can email you our Excel newsletter.

Understanding the Raw Data

Before we begin, let’s look at the challenges in the dataset:

dataset challenge for cleaning the data

  • Headers spread over two rows: The column headers are split over rows 1 and 2 and need to be combined.
  • Multiple columns for each question: Each survey question is in a separate column (columns O:CV).
  • Missing question numbers: Only the first response for each question contains a number (visible in column S in the image above).
  • Empty columns: Some columns contain no useful data and need to be removed (not shown in screenshot).
  • Final desired layout (shown below): We need a tabular format, where each question and sub-question is a row, with additional columns for response counts (the last two columns are not visible in the screenshot below).

challenges in cleaning a survey dataset

Step 1: Cleaning Headers with Power Query

Instead of using copy-paste and formulas like in Shashank’s method, Power Query makes this process much faster.

1. Load Data into Power Query

  • Select cell A1 and press CTRL + A to select all data.
  • Go to the Data tab and choose From Table/Range.

how to clean headers using Power Query?

  • At the Insert Table dialog box, uncheck “My table has headers” since headers span two rows.

2. Transpose and Fix Headers

  • In the Power Query window, go to Transform > Transpose.
  • Select Column1 and go to Transform tab >  Fill Down to populate missing question numbers.
  • Hold SHIFT and select Column1 & Column2 with a colon separator using Add Column > Merge Columns.
  • Right-click the new column > Move to the beginning
  • Delete Column 1 & Column 2 as these old heading columns are no longer required
  • Transpose the data back
  • Promote the first row to headers:

how to get the first rows as headers in Excel?

Step 2: Removing Unnecessary Columns

  1. Select unwanted columns (e.g., Start Date through to Custom Data 1).
  2. Press DELETE to remove them.
  3. Rename demographic columns for clarity (e.g., Division Primary, Division Secondary, Generation, Tenure & Employment Type).

Your query should now look like this:

how to remove unwanted columns in Excel?

Step 3: Unpivoting the Data

The raw survey data is in a wide format. To analyse it effectively, we need to convert it to a long or tabular format.

1. Replace Nulls with Blanks

  • Select question columns.
  • Use Transform > Replace Values to replace null with an empty string.

If we don’t take this step, any rows with null values will be removed in the next step.

2. Unpivot Questions

  • Select the demographic columns (Hold SHIFT and left click Respondent ID to Employment Type).
  • Right-click and choose Unpivot Other Columns.
  • Add Column for the Question. Select Attribute column > Add Column > Extract > Text Before Delimiter. The delimiter is the colon.
  • Rename Columns: Attribute = Question + Subquestion, Value = Answer and Text Before Delimiter = Question.
  • [Optional] Rename query: Unpivoted Data

Now, each question and sub-question is in a single column, making analysis much easier. We could stop there and skip to step 6 and analyse the data using PivotTables and formulas, but Shashank’s client wanted two calculated columns added.

Step 4: [optional] Calculating Respondent Counts

Count Number of Respondents per Question

1. Reference the Unpivoted Data query.

how to calculate values in Power Query?

2. Filter out blank responses from the Answer column:

how to filter blank rows in Power Query?

3. Select Respondent ID & Question (hold CTRL to select non-contiguous columns), then right-click > Remove Other Columns.

4. Select Question column, then Home tab > Group By > Count Distinct Rows.

how to group data in Power Query?

5. Rename query: Number of Respondents.

Count Number of Respondents per Answer

  1. Reference the original query again.
  2. Filter out blank responses from the Answer column.
  3. Select Respondent ID, Question + Sub-question, and Answer (hold CTRL to select non-contiguous columns), then right-click > Remove Other Columns.
  4. Select Question + Sub-question, and Answer.
  5. Home tab > Group By (advanced) > Count Distinct Rows:

how to group data in Power Query?

  1. Rename query: Same Answer Count.

Step 5: [optional] Merging Queries

1. Merge the Unpivoted Data query with ‘Same Answer Count’.

  • Select the Unpivoted Data query > Home tab > Merge Queries > As New
  • Match Question + Sub-question and Answer.
  • Use a Left Outer Join.

how to vlookup in Power Query?

    • Expand the merged column to bring in counts:

how to use Merge Queries in Power Query?

  • Rename the column ‘Number of Respondents’.
  • Rename the query ‘Final’.

2. Merge this query with ‘Number of Respondents’.

  • Home tab > Merge Queries
  • Match Question and Answer columns (hold CTRL to select multiple columns to match).

how to merge two tables in Power Query?

    • Use a Left Outer Join.
    • Expand the column to bring in respondent counts:

how to use a join in Power Query?

    • Rename the merged column ‘Same Answer Count’.

Step 6: Loading the Clean Data

1. Close & Load To > Connection Only (for intermediate queries).

how to load cleaned data from Power Query to Excel?

2. Edit Final Query > Load To >

how to edit a query in Power Query?

3. Table in New Worksheet:

how to insert Power Query data to Excel?

4. Verify the row count matches expectations.

Step 7: Automate with One Click

Power Query records every transformation step. If you receive new SurveyMonkey data, just make a copy of the file and paste in the new data over the top of the original data. Go to the Data tab > Refresh All, and Excel will apply the same cleaning process instantly—no coding required!

Why Choose Power Query Over Python?

  • Faster: Reduce an hour of work to just 5 minutes.
  • No Debugging: Avoid scripting errors and syntax issues.
  • 100% Repeatable: Apply the same steps to new data with a single click.

Ready to master Power Query? Check out my comprehensive course here and automate your data workflows today!

Leave a Comment

Current ye@r *