Different ways on how to scrape emails?



So I've tried OnPageScrapers.Email and Spider to scrape emails

Spider - Spider Mode (you can specify the depth of the spider)
Spider - List Mode (I noticed that the depth default is 0 - so it doesn't go to the other page of the website)

OnPageScrapers.Email - I think the depth is 0 as well

So the Spider Mode is more accurate - the problem is it's only working for a single URL.

Is there any way to set the depth for bulk checking? Thanks!



You can use formula mode and reference the column with the URLs:

You can also use Dump combined with Transpose when creating the formula to get all email addresses on the page



How to crawl all website pages using spider and find available email(s) and populate in single cell seperate using ; ?


That's a good suggestion. I'll try to add it in the next couple of days.


Added an additional field to the Spider. Let me know how it works:

You can get the new version by updating OnPageScrapers in the Manager Scraping category:

1 Like

Thanks!!! its working.


This website https://spritol.com has email available in different page not homepage, how can a single query in a cell can scrape all pages and populate email(s) in cell, I tried emails.csv but it doesn't work


That is because the emails are not part of the original HTML and loads afterwards. One approach is to use the PhantomJS Connector which can handle javascript loads. You would have scrape all pages from spritol.com, then run PhantomJS regex formulas on each row. This gets kind of tricky, but you can check out the email regex in the connector file for email and combine with the StringJoin function to capture all emails per page.