Notifications

Clear all

Resource Usage and Performance - .xls/.xlsx versus .csv/.txt

Page 1 / 2 Next

Power Query

Last Post by Mynda Treacy 8 years ago

16 Posts

3 Users

0 Reactions

154 Views

RSS

Chad Kukorola

(@chadkukorola)

Posts: 10

Active Member

Topic starter

Hello,

This is a technical question, as I have seen the differences, but wanted to see if someone knew the "why" behind the reality.

First, there are a couple things which I can't change: 32-bit Excel No direct connections to a DB or server. As such, I am inevitably stuck downloading .xls or .csv from a Data Warehouse, ERP, or other such web-based system. In testing, using a combination of Power Query and Loading into Power Pivot, the data model "failed" due to a lack of system resources when loading from .xlsx.

Specifically:

.xlsx - ~800K records and 46 fields - Ran out of system resources and was not able to load the dataset

.txt - 7.5 M records and 46 fields - Same data, duplicated for testing. I stopped at 7.5M records, but could have kept going because I wasn't near the 2 gig limit for 32-bit

This means it's certainly worth the time to convert to a text-type file, but why is there such a dramatic difference?

Thanks!

Chad

Posted : 26/02/2017 4:28 pm

Mynda Treacy

(@mynda)

Posts: 4761

Member Admin

Hi Chad,

When you imported the Excel data did you open the Excel file containing the data and then load to Power Query and Power Pivot all from within the same workbook?

Mynda

Posted : 26/02/2017 8:30 pm

Chad Kukorola

(@chadkukorola)

Posts: 10

Active Member

Topic starter

Mynda,

In my original test it was raw data from .xlsx into Power Pivot, and then raw data from .txt into Power Pivot (2013). I stated incorrectly above, as I didn't have Power Query available. In the end, when loading files (even into .accdb), the .csv/.txt are always handled more quickly by the software.

So, I just never understood why there's a difference. Now that I have Power Query in my toolset at work, it leads me to believe the flat files are still a better option than .xls (maybe not), but just don't understand the technical reasons.

Chad

Posted : 27/02/2017 11:36 am

Mynda Treacy

(@mynda)

Posts: 4761

Member Admin

Hi Chad,

Can you please clarify a couple of points:

1. When you load the data into the Power Pivot model are you loading data from workbook A into the Power Pivot model in Workbook B, or are you loading data in Workbook A into the Power Pivot model in Workbook A?

2. Is the raw data in Excel already formatted in an Excel Table, or not?

Mynda

Posted : 27/02/2017 10:45 pm

Chad Kukorola

(@chadkukorola)

Posts: 10

Active Member

Topic starter

Mynda,

1) Raw data is in Workbook A, then pulled into a data model in Workbook B.

2) In this case, the data in Workbook A was not in a table.

Workbook A was a download form an ERP (about ~50K records). For the test, I copied and pasted until the .xlsx was at its 1.04 M record limit. Same process for the .txt file. As noted before, the .xlsx loaded approximately 80% into Workbook B before Power Pivot failed. The .txt file with 7.5 M records uploaded into Workbook B without issue.

Chad

Posted : 28/02/2017 7:33 pm

Mynda Treacy

(@mynda)

Posts: 4761

Member Admin

Hi Chad,

Thanks for clarifying. I think it will be down to the 32-bit limitation on your version of Excel.

Is the Excel file containing the 1.04M records open when you load into Power Pivot? If so, have you tried loading it with the workbook closed to see if performance is any better?

Mynda

Posted : 28/02/2017 8:26 pm

Chad Kukorola

(@chadkukorola)

Posts: 10

Active Member

Topic starter

Mynda,

No, the raw data file closed when loading. My only thought is that pulling data from .xls* carries some .xls* "baggage", formatting or otherwise, which is compounded when scaling. However, I've not been able to find the technical answer.

Also, to avoid confusing folks, I have no performance issues after the data is loaded, simply on loading into the data model. It doesn't look like Power Query would overcome the bad data source issue either, so flat files it is!

Chad

Posted : 28/02/2017 9:07 pm

Mynda Treacy

(@mynda)

Posts: 4761

Member Admin

Thanks, Chad. I'll ask Microsoft for an explanation.

Mynda

Posted : 01/03/2017 12:11 am

Mynda Treacy

(@mynda)

Posts: 4761

Member Admin

Hi Chad,

No official word from Microsoft yet, but a fellow MVP offered this explanation:

"If I were to take a wild guess, I would say that 32-bit Excel is running out of memory.

If you examine the internals of a .xlsx, it’s a complete ball of string: text mixed with code mixed with formulas.

Excel has to parse it all, lay it out in memory as a tabular structure, update it all, then extract the values.

It’s limited in the amount of RAM it can attract to do all this in: my guess is it sometimes can’t get enough?"

This makes sense to me since the .xlsx file format is actually a kind of zip file or container of items that make up the file. Bed time reading on .xlsx file format here:

https://msdn.microsoft.com/en-us/library/aa338205%28v=office.12%29.aspx?f=255&MSPPError=-2147217396

Another point to note is it's best practice to bring your data into Power Pivot via Power Query. A word of advice from another MVP:

"I will NEVER EVER, EVER import a text file directly into Power Pivot again. Period. Been there, done that, been burned.

When you set up your initial connection it bases the data on a preview of the first 200 rows, iirc. And if it gets it wrong, and you build your BI on it, my recollection is that you get to rebuild the BI when you realize it. (You can’t add a schema file afterwards to fix it.)"

Mynda

Posted : 01/03/2017 6:34 am

Chad Kukorola

(@chadkukorola)

Posts: 10

Active Member

Topic starter

Hi Mynda,

Thanks for the info, makes sense to me. I have experienced some of the heartache from direct data connections to Power Pivot. I'd love to hear from MS if they do get back to you.

Bottom line, I'm loving Power Query!!

Chad

Posted : 01/03/2017 1:26 pm

Mynda Treacy

(@mynda)

Posts: 4761

Member Admin

Hi Chad,

Microsoft say:

"Power Pivot uses ACE for reading both types of files and provide an abstraction. It might have to do with that provider’s handling of such files. I wonder if you can use the provider to read data into Access for example, Mynda, to see what the performance difference is outside Power Pivot."

So, if you have time can you try to import the same Excel and CSV files into Access to see if you get similar results to when you load into Power Pivot. This will help identify if the cause is ACE or not.

Mynda

P.S. I'm so pleased you're loving Power Query 🙂

Posted : 01/03/2017 8:41 pm

Catalin Bombea

(@catalinb)

Posts: 1937

Member Admin

Hi Chad,

If you're using a 64 bit operating system, and office is in the 32 bit version, if you get the Out of Memory error, or "Excel cannot complete this task with available resources." , you can increase the 2 Gb memory limit by using a patch:

http://ntcore.com/4gb_patch.php

There is a Microsoft Update release that does a similar thing: update-for-excel-2013-kb3115162

More details can be found in the first link.

Posted : 02/03/2017 3:39 am

Mynda Treacy

(@mynda)

Posts: 4761

Member Admin

Hi Chad,

Please test Access before using the patch Catalin suggests so that we can compare like for like.

Thanks,

Mynda

Posted : 02/03/2017 6:14 am

Chad Kukorola

(@chadkukorola)

Posts: 10

Active Member

Topic starter

Mynda,

I attached Word document with of the results of my testing. Hopefully I did the upload correctly. Of note, it appears the Power Query engine handles .xlsx well, compared to Vertipaq. The test data set is obviously an extreme case, but the idea that this is the format of my data sources (a construct I can't change) is very real.

Let me know what you think!

Chad

Posted : 02/03/2017 12:05 pm

Chad Kukorola

(@chadkukorola)

Posts: 10

Active Member

Topic starter

Mynda,

I should note:

I tested an .xlsx file with 46 fields and 1.04M records, but very low cardinality. Power Pivot imported that file with little issue. Impressively, with that file, compression went from 177MB (raw data file) to about 700KB once loaded to Power Pivot and saving the excel file.

Chad

Posted : 02/03/2017 12:33 pm

Page 1 / 2 Next

26 Forums
7,241 Topics
31.8 K Posts
11 Online
35.2 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

Resource Usage and Performance - .xls/.xlsx versus .csv/.txt

Super Globals

Options and Features