I have Excel 365 and when I create a Microsoft Query by going to Get Data..From Other Sources..From Microsoft Query and connect to a SQL database using an ODBC connection and type a "select * from table" with no filters (table has about 218,000 records), it takes about 20 seconds for all the records to load into Excel.
When I create that same query using Power Query by going to Get Data..From Other Sources..From ODBC and input in that same SQL statement it takes about 3.5 to 4 minutes for the data to load back into Excel. Under Query Options, I have enabled "Fast Data Load", selected "Specify custom default load settings" and have both "Load to" settings unchecked. I have set Privacy levels to Always Ignore and have unchecked "Allow data preview to download in the background". None of this made any difference to the speed of the Power Query loading this data into the workbook. I also wrapped the source with Table.Buffer in the advanced editor and that made no difference either.
Why is the Microsoft Query able to load the data so much quicker. I do notice that Excel freezes when the Microsoft Query is running versus it not freezing when the Power Query is running even though I've tried all the options to force Excel to use more resources to load the data quicker in Power Query.
Any help would be appreciated.
Thanks,
Mike
Hi Mike,
Welcome to our forum!
Power Query is designed to fold queries back to the SQL server, however when you use your own SQL statement this folding cannot take place and Power Query is slower to load the data.
I recommend you don't paste your own SQL statement. Instead use the Power Query interface to select the tables and fields you want in your data set. Power Query will effectively write the SQL statement for you and fold it back to the source for processing.
More on Query Folding here.
Mynda
Hi Mynda,
Instead of putting in my own SQL statement, I tried your approach and did Get Data..From Other Sources..From ODBC and selected the table and loaded it to the worksheet. It still took the usual 3.5 minutes, which is way slower than bringing in the data via Microsoft Query (20 seconds).
As I understand it, if you input in a SQL statement, query folding does not take place for any additional transformations you put in Power Query after the SQL statement. The result set from the SQL statement is returned from the source and then all additional transformations have to be done on the client side. So if all I want to do is "Select * from table" and just load that data into a worksheet, it shouldn't make a difference whether you type in the SQL statement or just select the table. Regardless, selecting the table had the same slow speed.
So the question still remains, why does Microsoft Query load this table's data so much faster than Power Query (20 seconds vs 3.5 minutes)?
When Power Query is loading in the data and shows the increasing record count, it's approximately 1000 records every second and I'm able to click around the worksheet as it loads. It still feels like it's not using anywhere the same resources as Microsoft Query, which freezes up the entire screen for the 20 seconds it loads all the data. I was hoping that enabling "Fast Data Load" in Query Options would make Power Query increase the resources it's using like Microsoft Query does but it made no difference at all to the speed of the load.
Thanks,
Mike
Hi Mike,
I'll ask Microsoft, but can you please confirm if how many rows and columns in your data set?
And have you tried Get Data > From Database > From SQL Server Database?
Mynda
Hi Mynda,
The database is a SAP SQL Anywhere database.
The object is actually a SQL view with 114 columns and 218,508 records.
In Microsoft Query and Power Query I connect to it via a DSN that is on my machine for the ODBC connection.
Here are the times for the two approaches when I "select * from view" and load the data to a worksheet:
Microsoft Query: 45 seconds
Power Query (via Get Data..From Other Sources..From ODBC): 3min30sec (fast data load enabled, Privacy levels set to Always Ignore)
Here are the times for the two approaches when I select 25 columns from the view (instead of all 144 columns) but still the full 218,508 record set.
Microsoft Query: 20 seconds
Power Query (via Get Data..From Other Sources..From ODBC): 23 seconds (fast data load enabled, Privacy levels set to Always Ignore)
After doing these different runs, now my question is why is there such a big time difference between Microsoft Query and Power Query when I pull all the columns from this view?
For the smaller set of columns, both MQ and PQ perform about the same. On the full set of 114 columns data pull, it looks like PQ is doing something very different than MQ (i.e. writing to disc, trickling in records and not using all system resources, etc?)
Thanks,
Mike
Hi Mike,
I sent your case to my fellow MVPs and Microsoft. So far I've had this reply form fellow MVP, Miguel Escobar:
"That’s an interesting case, and something that I’m actually familiar with.
Is there any way that you could get the logs from the ODBC to see what commands the ODBC is receiving? some ODBCs require a Custom Connector to work correctly with Power Query.
Nevertheless, regardless of your data source, Power Query is known for getting a terrible experience when dealing with large amounts of columns. The first preview that you get of a table with 144 columns is usually around 50 rows or less (if you’re lucky).
I wonder if you’d get the same experience if you write your own native query against that ODBC. Perhaps it might go even faster."
Mynda
More feedback:
- Have you also disabled background refresh? https://blog.crossjoin.co.uk/2016/12/05/power-query-power-bi-and-the-allow-data-preview-to-download-in-the-background-option/
- Can you try setting the Supported Row Reduction Clauses property? https://blog.crossjoin.co.uk/2019/01/04/odbc-power-bi-power-query-refresh-performance/
And this:
This is too-general a question, so it can only be given a too-general answer : the time difference is because we’re talking about two different code bases making two different sets of calls to the underlying driver. We don’t believe there’s anything fundamentally wrong with the PQ implementation – as an example, you don’t see this kind of difference when using the SQL Server driver – but the sequence of calls we’re making is probably hitting different paths in the driver.
One notable difference we can point out is that PQ is loading various kinds of schema information from the database that Microsoft Query is not getting. This tends to be fixed overhead per database and independent of individual table sizes. But even there, we’ve sometimes run into drivers doing things like scanning an entire table in order to report its schema.
In order to investigate this in more detail, you’d probably want to find a notably-smaller table to try out and would capture ODBC traces from both environments to see how the API calls differ.
That said, why aren’t they using the built-in connector (Sybase) for this instead of an ODBC driver?
Hi Mynda -
I've disabled background refresh and set Supported Row Reduction Clauses to Top but these seem to help with working in Power Query Editor with the preview data but didn't make a difference to the load times of actually loading all the data to the worksheet.
I only have a local instance of SQL Server Express installed to play around with SQL Server testing with PQ. I brought down that view with all 114 columns and all rows to my local SQL Server. Then I loaded it to Excel using MQ and PQ with all columns and rows and MQ did it in 24 seconds and PQ did it in 29 seconds which is pretty much the same. I guess this is where you're saying the time difference doesn't exist between MQ and PQ when SQL Server is a data source?
I'm not sure how to run an ODBC trace. I did not see a logging option on my DSN used to connect to the SAP/Sybase DB if that's what you mean by a trace.
I've seen the built-in connector for Sybase before in Excel's Get Data options but for some reason don't see it in mine now. The specific driver being used is the "SQL Anywhere 17" driver for this SAP SQL Anywhere database. Maybe this is a different type of SAP/Sybase driver where it doesn't give me the built-in connector (Sybase) option on Get Data?
Thanks,
Mike
Hello,
This article has a link to where you can download needed driver plus also some minor information.
Hi Mike,
Try updating your install of Excel to see if the Sybase connector is available. I wouldn't waste time on other things until you can test the custom connector.
Mynda
Hi Mike,
A simple search after "ODBC tracing" will reveal this tutorial: https://blogs.msdn.microsoft.com/selvar/2007/11/10/odbc-tracing/
You can also use the search tool in our computer, for "ODBC Data Sources" (64) or (32) bit, depending on your office installation.
That will open the ODBC Data Source Administrator application (it is already installed on your computer, you don't need to install anything)
That application has a Tracing tab, where you can choose to Start Tracing, configure the location of the text file that will be produced.
Hi Catalin,
Thank you for the link on how to do an ODBC trace.
Hi Mynda and Anders,
I believe I already have the correct SAP SQL driver installed. Below is the driver I currently have installed that's used in my odbc connection to connect to the SQL Anywhere database. I pulled these details from the ODBC Data Source Administrator (Drivers Tab):
Name: SQL Anywhere 17
Version: 17.00.09.4838
Company: SAP SE or an SAP affliate company
File: DBODBC17.DLL
I have Excel 365 in the office. Aren't the 365 products always updated with the latest version's functionality?
Thanks,
Mike