Forum

Challenge To Scrape...
 
Notifications
Clear all

Challenge To Scrape A Page With VBA

2 Posts
2 Users
0 Reactions
84 Views
(@amirkhosravi)
Posts: 1
New Member
Topic starter
 

Hi,
I am trying to scrape and extract data from the link below by writing code in vba in excel:

tsetmc.com/Loader.aspx?ParTree=15131F

I used different techniques:

MSXML2.XMLHTTP60 it does not work
MSXML2.ServerXMLHTTP60 it does not work
SHDocVw.InternetExplorer beside it is too slow, it rarely works.
In Facts, when I open the link in Firefox or chrome, the page is ok and it is displayed correctly but when I request the page through "MSXML2.XMLHTTP60" or "MSXML2.ServerXMLHTTP60", the returned response is completely different from what it must be.

I should say that other links of this site have the similar behavior, for example:

tsetmc.com/Loader.aspx?ParTree=151311&i=20626178773287666

I guess the site is designed dynamically and uses JavaScript to load contents during the loading procedure. Also, when using excel vba, it seems that the server recognizes that the request is not sent from a browser.

Please help to find a solution and scrape the table in the mentioned URL.

Sub CreateMainList()

Dim MainURL As String
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim MainDiv As MSHTML.IHTMLElement
Dim MainDivChildren As MSHTML.IHTMLElementCollection
Dim Res As String
Dim price As Integer

'MainURL = ThisWorkbook.Worksheets("Home").Range("C2").Value
MainURL = ".:TSETMC:. :: دیده بان بازار پیشرفته"

XMLReq.Open "GET", MainURL, False
'XMLReq.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
'XMLReq.setRequestHeader "Content-Type", "text/html; charset=utf-8"
'XMLReq.setRequestHeader "Content-Type", "text/html; charset=utf-8"
XMLReq.setRequestHeader "Accept-Language", "en-US,en;q=0.5"
XMLReq.setRequestHeader "Connection", "keep-alive"
XMLReq.setRequestHeader "accept-Encoding", "gzip , deflate"
XMLReq.setRequestHeader "accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
XMLReq.setRequestHeader "DNT", "1"
'XMLReq.setRequestHeader "Upgrade-Insecure- Requests", "1"
XMLReq.setRequestHeader "Set-Cookie", "ASP.NET_SessionId=cd03mksrog04g2ocuaeqxweb; path=/; HttpOnly"
'XMLReq.setRequestHeader "Cache-Control", "Max-age = 0"

'XMLReq.setRequestHeader "Cookie", MyCookie
XMLReq.send

If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If

' Get the webpage response data into a variable.
'response = StrConv(request.responseBody, vbUnicode)

HTMLDoc.body.innerHTML = XMLReq.responseText
Debug.Print XMLReq.responseText

Set XMLReq = Nothing

Set MainDiv = HTMLDoc.getElementById("main")

End Sub

 
Posted : 06/07/2020 7:36 am
Philip Treacy
(@philipt)
Posts: 1629
Member Admin
 

Hi Amir,

If you need to interact with the page then you should consider using Selenium to drive your browser

https://www.myonlinetraininghub.com/web-scraping-filling-forms

That said, that site looks like it is doing real-time updates via JavaScript so scraping isn't the ideal approach to getting data off it.

You'd be better off using an API if the site provides one.

Regards

Phil

 
Posted : 06/07/2020 9:13 pm
Share: