Forum

Social websites ext...
 
Notifications
Clear all

Social websites extraction

8 Posts
2 Users
0 Reactions
101 Views
(@ehab1511)
Posts: 31
Trusted Member
Topic starter
 

Please i need to extract all Social websites from companies websites

Here all details:
I have a lot of companies websites and I want to collect their official Social websites by checking company website then get the Social websites.
I expect the macro will do below steps.
1- Open Company website.
2- Check all hyperlinks founded in website "home page", I mean no need to check all websites pages it is just main page like.
3- If the hyperlink contain Linkedin, macro get this link
5- If the hyperlink contain Facebook, macro get this link
6- If the hyperlink contain Twitter, macro get this link
7-If the hyperlink contain Youtube, macro get this link

I hope this applicable as I I really need it.

Many thanks for help me.

 
Posted : 27/09/2018 6:37 am
Philip Treacy
(@philipt)
Posts: 1629
Member Admin
 

Hi,

See attached workbook for a good start on this.  The code in the workbook will load a page and then parse the HTML for any links. 

It will print to the ActiveSheet links for Twitter, YouTube, LinkedIn, Facebook and Google+.  You can add more social networks in the code very easily.

If you load https://www.myonlinetraininghub.com the output you get is

https://www.youtube.com/user/MyOnlineTrainingHub

https://www.facebook.com/plugins/follow.php?href ="https%3A%2F%2Fwww.facebook.com%2Fmyonlinetraininghub&layout=standard&show_faces=false&colorscheme=light&font&width=450&height=35&appId=142334919174389"

https://plus.google.com/+MyonlinetraininghubA1

 

This is not foolproof as the VBA is merely looking for the occurrence of strings like 'YouTube', 'Facebook' etc within a link so any link to these sites will be listed in the ActiveSheet.

For this to work you will need to set a reference in your VBA editor : Tools > References > Microsoft HTML Object Library

Cheers

Phil

 
Posted : 28/09/2018 1:54 am
(@ehab1511)
Posts: 31
Trusted Member
Topic starter
 

Many Thanks for help me.

Please still I need your help as I am a beginner in VBA & MACROS.

I have a lot of companies websites and I expect the Input and Output for Macro as below.

Input: Company website in column A
Output: Linkedin URL column B, Facebook URL Column C, Twitter URL Column D, Youtube URL Column E.

Please check attached file.

I'm sure you are busy, when you have a chance please reply to my request as I really need it.

 
Posted : 28/09/2018 5:39 am
Philip Treacy
(@philipt)
Posts: 1629
Member Admin
 

Hi,

Please check this post

https://www.myonlinetraininghub.com/web-scraping-with-vba

Regards

Phil

 
Posted : 18/10/2018 2:29 am
(@ehab1511)
Posts: 31
Trusted Member
Topic starter
 

Hi Philip,

I want to thank you for the great efforts.

Yes it is ok, and I will use it to collect all social websites for Companies.

I just have some questions, please can help me.

1- Suppose If the website contain two Facebook or twitter or.... , I think the macro will get one only . I'm correct?

2- I have to collect the accounts email like [email protected] from http ://www.company.co.uk/ , can you update the macro by two columns as below.

First column: Account.  Please check F column as added example.  " As sometimes the companies put their email in home page"

Second Column: Contact Page URL.  Please check F column as I added example. "As I will use contact URL to extract the emails by Running the macro again in contact URL not website home page."

Thanks;

Ehab Ali

 
Posted : 19/10/2018 1:56 pm
Philip Treacy
(@philipt)
Posts: 1629
Member Admin
 

Hi Ehab,

This is where it gets difficult. To answer your first qs, as the macro is only looking for the string "facebook" in links, it will find any mention of "facebook" and as the code is written will only list one on the worksheet.

You really need to know exactly where the link is on the web page and what form it takes.  The link could be in the main body of the page, a sidebar, or in the footer.  If you know this by examining the page in the Inspector you could write code to extract it. But given that you've gone to that trouble, you may as well just copy and paste it into Excel.

So writing a generic macro to work for all sites just isn't possible because sites will be structured differently.

As for the 2nd qs, in order to find a specific email address I need to know that specific email address, and where to look for it.

The simplest way to look for an email address is to search for an @ but beyond that how do I know that the @ I have found is part of an email address and not part of something else?

Even if I wrote code to try to do further checks like make sure that after the @ there are some characters then a dot then possibly another dot and some characters I still don't know if the email address I've found is the one I want.  And bear in mind that not all email address/domains follow the structure 'something.xx.xx'

And if I find more than one @ on the page, how do I know which one is the email address I want?

Likewise for the contact page URL.  In your example workbook the 2 you have listed don't have any similarity in their structure.  One links to a page called IXYSDivisions.aspx, the other links to contact-us.  There's no way to know both of these are contact pages without actually looking at them and if you have to do that then you are back to copying/pasting the URL as being your best solution.

Sorry, you can't write code that will do what you want.

Regards

Phil

 
Posted : 19/10/2018 9:42 pm
(@ehab1511)
Posts: 31
Trusted Member
Topic starter
 

Hi Philip,

Really,  I want to thank you very much for help. I appreciate your support.

I agree with you completely about all you said.

Yes all sites have different structure but I have 2 million of companies and I have to get their data like social websites, emails, contact URL,.... etc.

And I'm sure I can't do this Manually as it takes long time. may years Confused.

So I have to find any way to help me to get data for these companies as I can.

I'm in Beginner macros but I have to find solution to help me on this.

I updated the code which you wrote it and I got some data but not Quality 100%, so I will check the result and apply some checks.

Attached file is updated macro, please check.

For emails, Marco will search about  "MAILTO:" and result of macro like this " mailto:[email protected] "

I need your help to write two sentences in code to do below actions.

1- Remove "mailto:" from macro output to be "[email protected]".  I mean replace "mailto:" by no thing " ".

2- I need to collect all emails in page, so do have any way to make the macro result like [email protected][email protected][email protected].   I mean collect all emails in one cell by this separator |

 

For URLs, the macro result as "about:en/contact.html" so I need your help to write one sentence to replace "about:" by the website.

Ex: website:  http://www.a-bright.com.tw/   macro result: "about:en/contact.html"   so I want to be   http://www.a-bright.com.tw/en/contact.html

Again, Many thanks for your efforts.

Thanks ;

Ehab Ali

 
Posted : 20/10/2018 6:38 pm
Philip Treacy
(@philipt)
Posts: 1629
Member Admin
 

You're still going to have to go through the results by hand to make sure they are correct.

Try this attachment.

Phil

 
Posted : 26/10/2018 8:04 am
Share: