Forum

removing duplicates
 
Notifications
Clear all

removing duplicates

6 Posts
3 Users
0 Reactions
109 Views
(@gkwietko)
Posts: 5
Active Member
Topic starter
 

Hi everyone, I wonder if anyone could help me...I have a set of data in which I have some duplicates which I need to remove. In one column I have stock number that duplicated and in another I have inv and credit note 33,333,33 & -33,333,33 which I also need to remove leaving only the recent invoice for 33,500. What would be the best way to approach this in power query? Many thanks for any suggestions.

Invoice/Credit Note No. Date Stock Number Price Sold
TM1425516 1240423 03321830   33,333.33
TM1426080 1240423 03321830 -33,333.33
TM1426101 1240430 03321830   33,500.00

Thank you 

Gosia 

 
Posted : 26/06/2024 10:27 am
(@mynda)
Posts: 4761
Member Admin
 

Hi Gosia,

You could add a column that converts the values in the Price Sold column to contain all positive values, let's call this column 'Value'. Then you could simply remove duplicates based on the Stock Number and Value columns. Using the example data above, you'd be left with the last item for 33,500.00.

However, there is a risk that you sell items for the same price and stock number that could get interpreted as a duplicate, when they're not. This isn't a limitation of Power Query, just a result of the nature of your data.

Mynda

 
Posted : 26/06/2024 8:30 pm
(@gkwietko)
Posts: 5
Active Member
Topic starter
 

Hi Mynda, in terms of removing duplicates. I did as suggested below, added column with value sold, converted it to absolute number. Then highlighted two columes ctr key+ stock number & column with value sold then removed duplicates but it doesn't quite work, I ended up with 2 lines, this is my data 

stock number  new value 
03321830 33333.00
03321830 33333.00
03321830 33500.00

and after removing duplications : I still have line with 33333 which I actually wanted to get rid of, I'm wondering what I could be doing wrong?

stock number  new value 
03321830 33333.00
03321830 33500.00

 

Thank you 

Gosia 

 
Posted : 18/07/2024 10:14 am
Riny van Eekelen
(@riny)
Posts: 1195
Member Moderator
 

Remove Duplicates in PQ actually filters for distinct values. So, it returns every existing value once. If you want to keep only unique values (i.e. values that exist exactly one time only), you need to group by the two columns, count, and keep  only rows where the count equals 1.

You'll find an example of the code in the attached file.

 
Posted : 18/07/2024 10:45 am
(@gkwietko)
Posts: 5
Active Member
Topic starter
 

That is helpful, thank you. Removing these with method you suggested would be ok for anything with count 2(where new invoice value is different to the previous 2 documents), however I have another challenge  where stock number has 3 rows with same value(after changing to absolute number) but I only want to keep 1.  I need to be able to keep one row. Does anyone have any tips? Thank  you how

Stock Number Price Sold absolute number 
03265654 23766 23766
03265654 -23766 23766
03265654 23766 23766
 
Posted : 19/07/2024 7:00 am
Riny van Eekelen
(@riny)
Posts: 1195
Member Moderator
 

Based on stock number and price (or absolute value) alone, it will be difficult. I believe that is what Mynda pointed out in her response. What logic would you apply if you were to perform this task manually. I guess you would look at the customer, a transaction date, and a document reference to the customer order perhaps. How would you otherwise be able to know that an invoice, a credit note and a corrected invoice 'belong' together?

Perhaps you can share some more complete data that resembles your real data and indicate which rows would have to be removed.

 
Posted : 19/07/2024 7:42 am
Share: