Due to data privacy, we opted to keep all data onsite while using Power BI. To do so, we setup a SQL server and run Reporting Services to deliver content.
I'm trying to understand, in this situation, what is the best way to work with extremely large datasets. If I use SQL tables directly, it looks like all of the data needs to be loaded to the local machine viewing the Power BI analysis. I've noticed that this can take an extremely long time, and some machines will even run out of memory before the analysis fully loads.
To fix this, I created an SSAS data model to house all of the information. Power BI gives an option to "Connect Live" to the SSAS data model. This fixes the long load times and memory issues, but restricts the ability to bring in additional data sources. No other data sources can be connected to. Nor can a custom data table be created. This limits what can be done within the Power BI analysis.
I may be making a simple problem too complex. What is the best way to serve analyses that sit on top of large amounts of data without putting the burden on the local machine viewing the analysis? Again, I am restrained by the requirement that our data must stay on our network.
Thanks!
William
Hi William,
Usually creating a 'view' of the data you want to get with Power Query and then connecting to that view is the most efficient way to get data from the SQL server. Avoid entering your SQL query into the Power Query 'advanced' dialog box as this prevents query folding.
There could be other things that are contributing to the query running slowly and sometimes there are easy fixes. This post might give you some ideas.
Mynda
Mynda,
Thank you. That is helpful.
In general, does all of the data need to be loaded and processed each time the analysis is opened by a user? Are you familiar with using the report services to host Power BI analyses? Does that happen there?
I know other applications have what they call "in-memory" data loading to speed up viewing of the analysis, does Power BI have something similar?
Hi William,
I'm not familiar with the Power BI report server, however the way Power BI works is you use Power Query to specify the data you want brought into the Power Pivot model in Power BI. Once you 'Close & Apply' the query it loads all of the data specified by the query to the Power Pivot model in Power BI. When you open the Power BI Desktop file, or Power BI report in the Power BI service, the data is not re-loaded again, since it's already in the file.
When you build visuals based on the data in the model it does not get the data again, it only references the data already in the model.
If you use direct query, then no data is loaded to the model and each time you build a visual or the visual updates, the source is queried.
Hope that clarifies things.
Mynda