I'm not sure if this question belongs in PQ but it's my best guess.
I've worked with live data connections and SQL queries before, and thus I know how easy it is when you can just refresh your data right in Excel, add columns with formulas, etc. Suffice it to say, I don't have this right now.
My current assignment is this.....I receive 5 data extracts every month. The extracts are in a couple of different formats but I have been converting them all to xlsx files. If they are too big (they go up to about 600,000 rows) I chop them into a few files. I save everything in a folder that I'll refer to as my "source data."
Next, I have a bunch of Excel files set up with calculations, charts, etc. In each of these I have to paste some data on one sheet and the formulas that are in the other sheets will get recalculated. To do this, I go to the source data files that I referred to above and cut and paste the parts that I need into my other Excel files.
This process is pretty ordinary in monthly accounting but for analysis it's cumbersome because if you want to analyze (or visualize) a certain item for the year, it means you have to go to 12 files and copy + paste. In addition, I have to re-do everything every month as if I were a human macro
If not for the size, I would simply append new months to each file and have 5 really big tables to work with. The problem is calculations are very slow, not to mention I'm limited to the million rows.
With all that said, should I be using Power Query and Power Pivot as well as Excel? I'm looking for conceptually how the process would look.
A million thanks!
Hi Martha,
That sounds like a good task for Power Query and Power Pivot. Keep in mind that you don't have to bring all the data into an excel sheet with PQ or PP. Data can be stored only in the data model, that can keep up to 300 million rows, according to Microsoft. You can load to sheet only a report for a certain item, using slicers, parameter tables, you're not even close to the limits of PQ and PP 🙂
Half a million rows will be a problem if you apply formulas to the tables produced, you should do all the calculations within PQ and PP. Don't even load the tables to sheets, just as connections, you can use those connections to insert a pivot table.
Thanks, that makes perfect sense to keep the data in the data model and not Excel.
Help me to understand something that just isn't clear to me....since it is possible to have a data connection with either PQ and PP, and since you can do things to the data in both PQ and in PP, how then do you decide when to use PQ and when to just load the data to PP?
It would be clear if the data needed a lot of work such as unpivoting; then I would use PQ first. But in my present task the data I have to load needs very little manipulation - I only have to add a couple of columns that take something from another column (in Excel it's text and mid formulas). My understanding is it's easier to learn DAX than M so I thought I might be able to load the data directly to the data model from external Excel tables. Have I missed the boat completely and I should ALWAYS put the data in PQ first?
I'm feeling like
Hi Martha,
The general rule is if you need to add calculated columns then you should use Power Query to do this as it's more efficient, whereas if you need to add measures then this is only possible in Power Pivot.
You might be surprised to find that you don't have to learn M to perform most tasks you want to do in Power Query, including parsing text.
Mynda