Get Data from the Web with Power Query

Mynda Treacy

March 28, 2019

The ability to get data from the web with Power Query is super handy and Power Query’s user-friendly GUI makes it quick and easy.

More importantly Power Query gives you an opportunity to clean the data before loading it into Excel. Then you can refresh it with one click to get updates.

The current Power Query limitation* is that only tables formatted in HTML <Table> tags can be easily imported by Power Query. Some web pages build tables using JavaScript, which is not covered in this tutorial.

Let’s use this English Premier League data from the Sky Sports website as the example:

English Premier League data

Note: At the time of writing, the URL above provided the data used in this example, but the layout and content of this page is out of my control and may change at some point in the future.

How to Get Data from the Web with Power Query

Step 1: Copy the URL for the web page containing the table. I’m using https://www.skysports.com/premier-league-table

Step 2: Excel 2016 onward – Data tab > From Web

data from the web for Excel 2016 onwards

Excel 2013 and earlier – Power Query tab > From Web:

data from the web for Excel 2013 and earlier versions

Note: If you don’t see the Power Query tab in Excel 2010 or 2013 you can download it here.

Step 3: Paste your URL into the ‘From Web’ dialog box then click OK:

from web dialog box

Step 4: At the ‘Navigator’ dialog (image below), the left-hand pane provides a list of tables available in the web page.

The first item, ‘Document’, contains the page HTML code, so it’s of no use to us, but any remaining tables typically contain data you can get with Power Query. Remember, it will only show you tables built using HTML table tags.

Clicking on the table called 'Premier League 2018/19' gives you a preview of the data in the right-hand pane:

data preview on right hand pane

And selecting the ‘Web View’ places a green box around the table on the web page itself:

select web view

Tips:

  1. You can view the Navigator dialog box in full screen by clicking the full screen icon in the top right of the dialog box.
  2. If you want to import multiple tables then check the ‘Select Multiple items’ box in the left-hand pane.

Step 5: Once you’ve selected the table you want to import, click on the ‘Transform Data’ button. This opens the Power Query editor window which gives you an opportunity to clean the data before loading it into Excel or the Power Pivot Data Model.

click on transform data

Once you’re done cleaning the data you’re ready to load it to Excel or the Power Pivot Data Model.

Step 6: Home tab > Close and Load To:

close & load to button

Which opens the Import Data dialog box:

import data dialog box

Tip: If you load it to the Data Model, be sure to select ‘Only create connection’ so that you don’t duplicate the data in your file i.e. once in the worksheet and again in the Power Pivot data model.

Step 7: Refresh the data. To get updates from the web page simply go to the Data tab of the ribbon and click ‘Refresh All’:

refresh all

Or if you have multiple queries you can open the queries and connections pane:

queries and connections

In Excel 2013 and earlier go to the Power Query tab > Show Pane :

show pane

Find the query in the list > right-click > Refresh:

find the query in the list

*Power Query Web Limitation

As I mentioned earlier, Power Query is great at getting data from the web where it’s formatted as an HTML table, but not tables generated using JavaScript. You can easily tell if the table is HTML or JavaScript by inspecting the web page source code and looking for the HTML table tags.

To do this, right-click some empty space on the web page > View Page Source (or similar depending on the browser you use):

view page source

CTRL+F to open the Find dialog box. Enter ‘<table’. This will highlight any instances of a HTML table tag.

open the find dialog box

If you find HTML table tags then it confirms there is a table that Power Query can get from that page, however it doesn’t guarantee it’s the table you actually want, as there may be other tables on the page.

More Power...Query

Power Query can get data from a myriad of places and has loads of tools for cleaning and transforming data. Here are some more examples:

Consolidate Excel Sheets

Get Files from a Folder

And one of the most common tasks, unpivoting: Unpivot with Power Query

If you’d like to learn Power Query, and you should 😉, please consider my Power Query course.

6 thoughts on “Get Data from the Web with Power Query”

  1. Dear sir,
    I am unable to find advanced option in from web popup, please help me in enabling it.

    I am using ms office 2016

    Thank you in advance waiting for your reply.

    Reply
    • Hi Jayasai, You can try updating your installation of Excel to see if you get this feature, but if it’s not there after updating, then it’s not available in your version of Excel, sorry Mynda

      Reply
  2. I have recently upgrades to Office 365 Pro Plus. I have a practice from the past to import a HTML table from a web site into Excel that includes HTML hyperlinks. In the previous version of excel, there was an ‘options’ popup in the process and I could choose ‘full html format’ and the links were retained in excel. Now that option is gone. Are you aware of a way to load from a web data source an retain the html hyperlinks?

    Thank you

    Reply
    • Hi Peter, I don’t remember being able to retain the link status when importing data from the web with Power Query, and I’m not aware of any way to do it now, sorry. Have you tried adding a column to the Excel Table that Power Query outputs with the HYPERLINK function that references the text URL and converts it to a clickable link? Mynda

      Reply
  3. I have a power query that gets a web-based database and loads it to Excel.
    The web page consists of the last 10,000 rows of the database, with the latest row on top.
    If, when I refresh the query, 50 rows have been added, then the oldest 50 rows are no longer on the database. Therefore, when my data is refreshed I am left with the latest 10,000 rows but I have lost the oldest 50 rows.
    Is there any way I can refresh the query but not delete the older rows in my original table, so my table grows with each refresh?

    Reply
    • Hi Terry,

      Power Query is designed to get the data from the source, including any changes. If your source changes then you’d need to intervene with VBA to take a copy of the data before the refresh so you can retain those old rows. There’s no setting in Power Query to do this, sorry.

      Mynda

      Reply

Leave a Comment

Current ye@r *