Hi,
I am trying to scrape and extract data from the link below by writing code in vba in excel:
tsetmc.com/Loader.aspx?ParTree=15131F
I used different techniques:
MSXML2.XMLHTTP60 it does not work
MSXML2.ServerXMLHTTP60 it does not work
SHDocVw.InternetExplorer beside it is too slow, it rarely works.
In Facts, when I open the link in Firefox or chrome, the page is ok and it is displayed correctly but when I request the page through "MSXML2.XMLHTTP60" or "MSXML2.ServerXMLHTTP60", the returned response is completely different from what it must be.
I should say that other links of this site have the similar behavior, for example:
tsetmc.com/Loader.aspx?ParTree=151311&i=20626178773287666
I guess the site is designed dynamically and uses JavaScript to load contents during the loading procedure. Also, when using excel vba, it seems that the server recognizes that the request is not sent from a browser.
Please help to find a solution and scrape the table in the mentioned URL.
Sub CreateMainList()
Dim MainURL As String
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim MainDiv As MSHTML.IHTMLElement
Dim MainDivChildren As MSHTML.IHTMLElementCollection
Dim Res As String
Dim price As Integer
'MainURL = ThisWorkbook.Worksheets("Home").Range("C2").Value
MainURL = ".:TSETMC:. :: دیده بان بازار پیشرفته"
XMLReq.Open "GET", MainURL, False
'XMLReq.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
'XMLReq.setRequestHeader "Content-Type", "text/html; charset=utf-8"
'XMLReq.setRequestHeader "Content-Type", "text/html; charset=utf-8"
XMLReq.setRequestHeader "Accept-Language", "en-US,en;q=0.5"
XMLReq.setRequestHeader "Connection", "keep-alive"
XMLReq.setRequestHeader "accept-Encoding", "gzip , deflate"
XMLReq.setRequestHeader "accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
XMLReq.setRequestHeader "DNT", "1"
'XMLReq.setRequestHeader "Upgrade-Insecure- Requests", "1"
XMLReq.setRequestHeader "Set-Cookie", "ASP.NET_SessionId=cd03mksrog04g2ocuaeqxweb; path=/; HttpOnly"
'XMLReq.setRequestHeader "Cache-Control", "Max-age = 0"
'XMLReq.setRequestHeader "Cookie", MyCookie
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
' Get the webpage response data into a variable.
'response = StrConv(request.responseBody, vbUnicode)
HTMLDoc.body.innerHTML = XMLReq.responseText
Debug.Print XMLReq.responseText
Set XMLReq = Nothing
Set MainDiv = HTMLDoc.getElementById("main")
End Sub
Hi Amir,
If you need to interact with the page then you should consider using Selenium to drive your browser
https://www.myonlinetraininghub.com/web-scraping-filling-forms
That said, that site looks like it is doing real-time updates via JavaScript so scraping isn't the ideal approach to getting data off it.
You'd be better off using an API if the site provides one.
Regards
Phil