Forum

Get ALL Entries fro...
 
Notifications
Clear all

Get ALL Entries from a Webpage

8 Posts
3 Users
0 Reactions
277 Views
(@matthias)
Posts: 49
Trusted Member
Topic starter
 

Hello,

I would like to scrape a webpage with power query. When I open the page in a browser it shows me 50 entries and only if a scroll to the bottom additional entries are shown. It is not additional pages, but showing the addional entries on the same page with the same url.

In power query I also see only 50 entries, but here I can't scroll down to get additional entries.

Adding time like this (e.g. 50sec), does not help to get all entries - still only 50 entries are shown, resp. included in the html code:
Web.BrowserContents("https:/...", [WaitFor = [Timeout = #duration(0,0,0,50)]])

Any chance to force a webpage to deliver all entries?

Regards,
Matthias

 
Posted : 25/08/2022 12:25 pm
Riny van Eekelen
(@riny)
Posts: 1196
Member Moderator
 

Hi Matthias,

I would be helpful if you could share the link the web-site, provided it doesn't contain confidential information.

Riny

 
Posted : 25/08/2022 1:08 pm
(@matthias)
Posts: 49
Trusted Member
Topic starter
 

Hi Riny,
it is a webpage, which contains info, which I should not share.
And now I only find webpages which work with page numbering e.g. Blog • Page 2 of 61 • My Online Training Hub
But it is not rare that webpages initially only give a subset and after scrolling down add entries, so it should be possible to find some example. Perhaps you know one / can find one too.

Thanks,
Matthias

 
Posted : 25/08/2022 2:18 pm
Riny van Eekelen
(@riny)
Posts: 1196
Member Moderator
 

Perhaps you'll find an solution here:

https://www.myonlinetraininghub.com/scrape-data-multiple-web-pages-power-query

 
Posted : 25/08/2022 3:01 pm
Philip Treacy
(@philipt)
Posts: 1631
Member Admin
 

Hi Matthias,

PQ can't interact with a website in this way.  You could use Power Automate though.

You can start a browser, then send the END key to the browser window to make it scroll to the end of the page and load more data.

You then of course need to create the PA steps to scrape the data you need.  You can't use PA and PQ together to scrape data.

Regards

Phil

 
Posted : 25/08/2022 8:16 pm
(@matthias)
Posts: 49
Trusted Member
Topic starter
 

Thanks Riny, that can only be used if the url is changing for each page, or every 10 entries.

Phil, "You can't use PA and PQ together to scrape data" means there is no way with PQ. I hoped for some kind of url modification like ?sort=desc or a kind of filter added to the url which could make the query work incremental.

Thanks,
Matthias

 
Posted : 26/08/2022 1:18 pm
Philip Treacy
(@philipt)
Posts: 1631
Member Admin
 

Hi Matthias,

The behaviour you describe from the webpage indicates that the loading of new data is triggered by scrolling to the bottom of the page. as such it must be triggered by JavaScript.  Unfortunately there is no way for PQ to make web pages scroll like this and/or trigger JS.  But Power Automate can.

If you could load more data by adding a parameter to the URL then you could use PQ, but it doesn't sound like this is possible from what you have said, and without being able to access the webpage, I can't check.

Regards

Phil

 
Posted : 26/08/2022 6:05 pm
(@matthias)
Posts: 49
Trusted Member
Topic starter
 

Hi Phil,

yes,  it is a typical scroll to the bottom to load more data behaviour and the code mentions type="text/javascript".
I can use ?order=XYZ to sort the data (ascending) according to different criteria. There might be other parameters, but that is all what I found.

Thanks again,
Matthias

 
Posted : 27/08/2022 9:53 am
Share: