I've written about web scraping before using VBA, but never about interacting with the website to do things like filling in forms.
We had a forum question asking how to do exactly that so I tried using the same approach as I had previously with the HTML Object Library, but when it came to grabbing elements from the web page, the results were inconsistent. Sometimes I'd get what I wanted, sometimes not.
Then I remembered Selenium which is software that automates browsers.
You can write code that instructs Selenium to do things like open a web page, fill in a form, or click a button, and it's really easy to use.
You can use it to automate browsers like IE, Firefox, Chrome, Safari and Opera and it works with many programming languages such as (but not limited to) C#, JavaScript, Perl, PHP, Python, Java and R.
To use it with Excel you need to use the SeleniumBasic library which allows you to drive Selenium with VBA.
SeleniumBasic supports a smaller range of browsers than the full Selenium implementation, I chose to use Chrome. Please note that up to date versions of Firefox are not supported, and you will need to downgrade to a much older version of FF if you really want to use it.
Setting Up SeleniumBasic and Chrome
- Download and install the latest release of SeleniumBasic
- Download and install ChromeDriver
- Create a reference in your VBA project to the SeleniumBasic library
Installing SeleniumBasic
You can download SeleniumBasic from GitHub.
Run the .exe file and follow the instructions.
Installing ChromeDriver
You need to install the version of ChromeDriver that matches your installed version of Chrome.
To check what version of Chrome you have installed, start it, and then in the address bar of a tab type chrome://settings/help and hit the Enter key.
I have version 80.0.3987.87 installed so I need version 80 of ChromeDriver.
Get the version of ChromeDriver that matches your version of Chrome from here.
You just need to match the major version number i.e 80, 79 etc
Clicking one of the links shown in the above image takes you to a high-tech looking web page.
Download and unzip the version that works with your operating system. I'm running Windows 10 Pro 64-bit so the win32 file is the one I want.
Unzipping will give you just one file chromedriver.exe, you need to copy this to the folder where SeleniumBasic was installed. In doing this you will be copying over the version of chromedriver.exe that came with SeleniumBasic.
In my case the installation location was C:\Users\pgt\AppData\Local\SeleniumBasic. It will be different for you and will depend on what version of Windows (or other OS) you are using.
Creating a Reference to the SeleniumBasic Library
With your new Excel workbook open go into the VBA editor (ALT+F11) and select the workbook.
From the menu, go to Tools->References, find Selenium Type Library and check the box beside it. Then click the OK button.
You can now write VBA that accesses Selenium.
Practical Use - Filling in Forms and Looking Up Multiple Records
Let's say we need to look up VAT (Value Added Tax) records for companies from Great Britain. We can do this from this website http://ec.europa.eu/taxation_customs/vies/
Choose a Member State from the dropdown, so 'GB-United Kingdom' in this instance.
Then enter the VAT Number, and click on Verify.
The result gives us various information, but we are only interested in the company name, and need to extract that from the web page and put it into Excel.
We have a list of VAT numbers on our sheet and want to automate the process of looking up the company name associated with each one.
Interacting With A Web Page
In order to do something like fill in a form or extract data from a web page, we need to understand the structure of the web page.
We can use the Developer Tools in the browser to inspect the underlying HTML that constructs the page.
In Chrome you can either type CTRL+SHIFT+I or right click on the page and click on Inspect. Other browsers are similar, right click on the web page in those browsers to bring up the menu that will give you access to the Dev Tools/Inspector.
This will open a pane at the bottom of the browser like this
As we are interested in entering the Member State and VAT Number we need to know how to find those things in the HTML.
If you put your mouse pointer over the Member State dropdown and right click, then click Inspect, the Inspector window will highlight the HTML that creates this dropdown.
Notice that the highlighted element has an id countryCombobox. We'll need that later.
If we right click on the top most VAT number box, and Inspect that you'll find it has an id called number. We'll remember that for later too.
The last thing we need to look at on this page is the Verify button. Right click it and Inspect and you'll see it has an id in HTML called submit. Store that id for later use too.
If we now look at the results page, we are interested in the company name. Right click it, Inspect and you'll find where it is located in the HTML.
Notice here that the element storing the company name doesn't have an id, so we'll have to use another way to identify it in our VBA so we can get the name into Excel.
We can do this using a method called FindElementByXPath.
Read more about using XPath in Selenium.
What this allows you to do is to refer to a HTML element based on its position within the HTML. You can use various tag names or attributes to indicate what piece of information you want.
In this case, we are after some text in a HTML table. Inside the table is a tag called <tbody>, which contains several <tr> tags.
The company name we want is located in the 2nd <td> tag in the 6th <tr> tag.
We can describe the location of the company name like this //table/tbody/tr[6]/td[2]. Note that there is only 1 table in the results page HTML. If there were more I'd have to specify which table I wanted by using, for example, table[1] etc.
Now we know how to access the parts of the HTML pages we need, we can write our VBA.
VBA To Drive Selenium
Start by declaring and creating the Selenium driver.
I'm using a WHILE loop to work my way down Column A and read off the VAT numbers one by one. I'm using a variable called count to keep track of what row I'm on.
When the cell has 0 length (there's no VAT number in it), the WHILE loop ends.
Next the code tells Selenium to Get (load) the website.
The next 3 lines use a method called FindElementById to interact with the bits of the web page we found earlier using the Inspector.
Using the SendKeys method you can send keystrokes or text to the selected element.
So I send the country code GB, then the VAT number read from Column A of the sheet.
Finally the code Clicks on the submit element (the Verify button).
The results page will now load and we can get the company name with this line, and store it in a cell in Column B. So if count has reached 3, we store the company name in B3.
Repeat the process until all the VAT numbers have been checked.
Download the Example Workbook
Get a copy of the code used in this post.
Enter your email address below to download the workbook.
Summary
Once you get everything installed, using Selenium is pretty easy. I've used it to fill in just a couple of things in a form, but it would not be hard to expand on this to complete more complicated forms, even over several pages.
Hi, My name is Lucia. I’m doing your Power Query Course. I have a question about getting a table from WEB with log in. EX: I want to get bank and credit card transactions direct from the web for a budget and reconciliation. Does it works with power query web function?
Hi Lucia,
You can get data from webpages that require authentication, however not all authentication methods are suitable. It works if your website simply requires a username and password. In which case you can select ‘Basic’ from the dialog box shown here in step 3.
I hope that points you in the right direction.
Mynda
I have used your VBA code in this website “https://services.gst.gov.in/services/searchtp” which has captcha. I am unable to run the code. I am attaching the code below for your reference. moreover i am getting run time error ‘-2146232576 (80131700)’. Please help me in solving this. Thanks in advance.!!!
Option Explicit
‘ Written by Philip Treacy
‘ https://www.myonlinetraininghub.com/web-scraping-filling-forms
Sub Scrape()
Dim Driver As New Selenium.ChromeDriver
Dim count As Long
Sheets(“VAT Lookup”).Activate
Set Driver = CreateObject(“Selenium.ChromeDriver”)
count = 1
‘ Code assumes all VAT numbers in Column A are valid
‘ No error checking is included here in case they are not
‘
While (Len(Range(“A” & count)) > 0)
Driver.Get “https://services.gst.gov.in/services/searchtp”
Driver.FindElementById(“for_gstin”).SendKeys Range(“A” & count)
Driver.FindElementById(“lotsearch”).Click
Driver.Wait 1000
‘ Uncomment the next line if you need to introduce a delay in the browser
‘ to allow it to load the results page
Range(“B” & count) = Driver.FindElementByXPath(“//table/div[2]/div[1]/div/div[2]/p[2]”).Text
count = count + 1
Wend
Driver.Quit
End Sub”
Ok, can you clarify at which line of code it fails?
Or it’s the captcha that you believe is causing the issues?
If you’re looking to beat the captcha, that’s not going to be easy, it’s better to adjust the code to display the page for you to solve the captcha, before doing what you need.
Thanks for your reply.
The error is in this part of the code
“Set driver = CreateObject(“Selenium.ChromeDriver”)”
Im afraid that captach is the one that is causing the troubling.
Can you guide me with the code to by pass the captcha?
That error means you did not installed selenium properly.
Please make sure you follow all the steps provided in this article: identify your Chrome version, download the correct ChromeDriver, save it in the SeleniumBasic installation folder.
I never tried to pass captcha from vba, there is a discussion here, hope it helps.
You can try our forum, to upload a sample file, our members will try to help you.
Hi Philip
Unfortunately (due to Brexit) the VIES website no longer works for UK VAT numbers. Instead this has been replaced with https://www.tax.service.gov.uk/check-vat-number/enter-vat-details. I’m struggling to identify the relevant bits in the html code to use within the macro. This is as far as I have got with updating the macro so far:
Sub Scrape()
Application.ScreenUpdating = False
Dim Driver As New Selenium.ChromeDriver
Dim count As Long
Set Driver = CreateObject(“Selenium.ChromeDriver”)
count = 1
While (Len(Range(“A” & count)) > 0)
Driver.Get “https://www.tax.service.gov.uk/check-vat-number/enter-vat-details”
Driver.FindElementById(“target”).SendKeys Range(“A” & count)
Driver.FindElementById(“govuk-button”).Click
Driver.Fi
Range(“B” & count) = Driver.FindElementByXPath(” //table/tbody/tr[6]/td[2]”).Text
count = count + 1
Wend
Driver.Quit
Application.ScreenUpdating = True
End Sub
It works as far as going to the relevant website and entering the first VAT number but I can’t work out what code to change ‘Driver.FindElementById(“govuk-button”).Click’ to so that it presses the search button.
Also, I am then unsure what to change ‘Range(“B” & count) = Driver.FindElementByXPath(” //table/tbody/tr[6]/td[2]”).Text’ to, so that it returns the company name to the spreadsheet.
Please would you be able to advise on the updated code to make this work on the new website?
Thanks in advance for your help
Thanks
Dave
Hi David,
The answer is in the web page, you will have to study it.
Right click the button from website, and choose Inspect. This will reveal the button HTML code, you will find there the button id you need.
Same for XPath, you have browser developer tools to reveal the xpath for a specific object.
Hi Catalin
The html code for the button is ”
Search
”
However, I am unsure which bit of this I would refer to in the macro?
Thanks
Dave
Hi Dave,
Can you open a new topic on our forum? You can upload the sample file and code you currently have.
The html code you added in the comment is not visible, will mess up this page html.
Hi Catalin
Not to worry – I’ve figured it out. Didn’t realise you could right click and copy the xpath once you’ve chosen inspect – once I realised that I figured out that the below macro would work 🙂
Sub Scrape()
Application.ScreenUpdating = False
Dim Driver As New Selenium.ChromeDriver
Dim count As Long
Set Driver = CreateObject(“Selenium.ChromeDriver”)
count = 1
While (Len(Range(“A” & count)) > 0)
Driver.Get “https://www.tax.service.gov.uk/check-vat-number/enter-vat-details”
Driver.FindElementById(“target”).SendKeys Range(“A” & count)
Driver.FindElementByXPath(“/html/body/div/main/div/div/form/button”).Click
Range(“B” & count) = Driver.FindElementByXPath(“/html/body/div/main/div/div/p[2]”).Text
On Error Resume Next
Range(“B” & count) = Driver.FindElementByXPath(“/html/body/div/main/div/div/div[2]/p[1]”).Text
count = count + 1
Wend
Driver.Quit
Application.ScreenUpdating = True
End Sub
Glad to hear you managed to make it work 🙂
Hi i have written the code , but i’m getting the Error [ automation error ‘-2146232576’ ]
There is a new Error now “runtime error 1004 method range of object _global failed”
Could you please HELP me out here.
Hi Amit,
Looks like you’ve typed in my code from scratch? There are some pieces missing.
You haven’t declared the variable count and the line Driver.Quit is missing. Please refer to my downloadable workbook example.
To help any more I’d need to see exactly what code you have now. I’m guessing you’ve changed something since your first comment so without seeing the modified code, impossible for me to tell you what’s wrong.
Please open a topic on our forum and attach your workbook and I’ll have a look.
Regards
Phil
Hi Phil,
I have downloaded the file and ran the code, but what happens is it has gone in a infinite loop.
We need to alter count (presumably decrement towards 0) in this loop so as to satisfy the exit condition.
Also since the code is running half way, i want to ask you :
***If we have the Selenium library added to this project references we shouldn’t need to use “CreateObject” and since we’ve declared the variable as “new Selenium.ChromeDriver” the instance is already created so the line below is redundant anyway. ***
Thanks
Amit`
It runs just perfect in one system
But,
In my other system i get this “Runtime Error -2146232576 –
Automation Error
Though i have performed all the tasks of installing Selenium , also installed the Chrome Driver too, then the references too in the VBA editor.
But other system just cant go past Automation Error.
Could you please Have a look into this and please suggest me a way to contact you, because i have a different website to crack.
Hi Amit,
As I’ve already said, I can’t debug your code without seeing it.
Please start a topic on our forum and attach your workbook.
Regards
Phil
Great tutorial!
When I run the script in VBA, Chrome opens with the message “Chrome is being controlled by automated software” and crashes.
Any idea on what’s going on?
Thanks.
Hi,
The message about Chrome being controlled is normal. As for the crash, could be a few things. Have you got the ChromeDriver version to match your installed version of Chrome? I’ve got Chrome 80.x.xxxx.xx installed and I’m using ChromeDriver ver 80 and it works ok for me
https://sites.google.com/a/chromium.org/chromedriver/downloads
Failing that you can try these
https://sites.google.com/a/chromium.org/chromedriver/help/chrome-doesn-t-start
Regards
Phil
Estou a tentar fazer login em um site, porém não aceita colagem no campo senha, somente digitado diretamente, dessa forma ele sempre entende que estou copiando e colando valores, alguma dica de como contornar esse problema?
Você já tentou o código de raspagem da web? Você também pode usar um gerenciador de senhas como o Keepass. Com isso, você pode arrastar uma senha para um formulário da Web, em vez de copiar / colar.
Hi Phil, with Selenium IDE you can record steps you are doing on a webpage. These steps you can adapt and then save as “…. .side” files. Is there a possibility to trigger the excecution of these .side files by VBA without installing Selenium Basic?
Thanks,
Matthias
Hi Matthias,
Yes I don’t see why it wouldn’t work, but I haven’t tested it.
As long as you have all the necessary bits installed
https://www.seleniumhq.org/selenium-ide/docs/en/introduction/command-line-runner/
then you can use a VBA Shell call to run the Selenium CLR
Cheers
Phil
Wonder where you got the idea of the VAT website from? 🙂 Thanks again for your help and this blog is very comprehensive and will be very useful for future use.
No worries 🙂