Hi,
I have Codes exceed 4Million and I want to remove the duplication so I put it in 4 columns due the excel have count limitation to be added in one column.
So can help me to remove the duplication from all columns at one time.
Thanks;
Bill
Hi Bill,
Why don't you use Power Query to do this for you? Presumably the data is in a CSV or Text file in one column. You can use Power Query to get the data from the CSV file:
1. Excel 2010 & 2013 go to the Power Query tab/Excel 2016 onward go to the Data tab
2. Get Data from Text/CSV
3. Browse to the file location and import it.
4. In the Power Query editor window's the Home tab > Remove Rows > Remove Duplicates > Close & Load.
Mynda
Hi Mynda,
Thank you very much for your fast reply.
Could you please help me about this, I'm using Excel 2010 and I can't find Power Query tab, please check attached image and let me know how I can get it.
Thanks;
Bill
Hi Bill,
In Excel 2010 Power Query is a free add-in that you can download here: https://www.microsoft.com/en-us/download/details.aspx?id=39379
When installed you'll have a dedicated Power Query tab on the ribbon.
Mynda
Hi Mynda,
Please accept my gratitude. Your support means a lot to me!.
I got Power Query and I got how to remove the duplication as you see in attached image but I still need your help to know how I can download or save in text file after the duplication data removed.
Thanks;
Bill
Hi Bill,
When your data is ready click on Close & Load in the top left of the query editor. NOTE click on the text Close & Load, not the icon.
From the sub-menu click on Close & Load To. Then choose to load to a table on an existing or new worksheet.
If you want to save the data out as text from there you can do so.
Regards
Phil
Deep thanks Philip.
I did this but when I save data as text, I see after I save it as text, the count not correct.
It is supposed the count is 3,407,267 but the file text contain1,048,576. please check attached images.
Thanks;
Bill
Hi Bill,
How do you know the count is wrong?
Can you please share the Excel workbook and the source data file(s) so I can check.
Thanks
Phil
Hi Bill,
1048576 is the maximum number of rows in excel.
Obviously, we cannot add in a sheet more that this number of records, so we have to find an alternative.
Here i what you can do:
1. Add a line in the query to split the table in 1.000.000 records buckets.
split=Table.Split(#"PreviousStepName",1000000),
2. Convert this list to table:
#"Converted to Table" = Table.FromList(split, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
3. Add an index column, starting from 1, step 1.
#"Added Index" = Table.AddIndexColumn(#"Converted to Table", "Index", 1, 1, Int64.Type),
4. Expand Column1. This action will regenerate the initial data, but will have now a paging column. If the data has over 3 million rows, you will have in that column numbers from 1 to 4, each page will have 1.000.000 records, as we set in step 1.
5. Instead of Load To a table in a worksheet, choose to load to a Pivot Table Report.
6. In this pivot table, add a slicer for the column we added for paging. This will allow you to see data in pages, 1 million at a time.
You can also show report pages, that will create 1 sheet for each page.
Example attached.
You can set the csv file path and group size to split into as many pages you need.
https://1drv.ms/x/s!AjfS33R8yoG9ksIBel-Zcg96s9GZCA?e=IQF9dC
File size is too large, choose the Open In Desktop App option.