Scraping with xPathOnURL


#1

Hi all,

since a long time I scrape data from startup websites. Since a few month I’m proud to be a pro user :smile:
One of the websites changed their sourcecode. Normally I change the XPath and everything works again. But their is one website I can’t crawl. --> Link

I need the href values for every startup subsite. In this case the first href should be: https://www.kickstarter.com/projects/explorer-plus/solar-backpack?ref=category_location
and the next one: https://www.kickstarter.com/projects/783648302/medisano-minimum-effort-and-maximum-output?ref=category_location

I can scrape all the attributes around this one, but I don’t get this specific href. I used different paths and don#t found a way that works…

Can someone give me an advice???


#2

Hi Sebastian,

Most likely, that website has implemented some javascript which is generating the content you're trying to scrape. I couldn't get the links either, but managed to do it with a Connector called PhantomJs Cloud. It uses a headless browser to execute the javascripts before scraping the site.

You can read more about the Phantom Js Connector here:
http://seotoolsforexcel.com/phantomjs-cloud/


#3

Hi Victor,

thumpbs up!!! It works :slight_smile:
Thank you for your help…


#4

Hi all,

Victors approach works absolutely perfect. But they changed their source code a few days ago.
I tried it really to get the information I need. I don’t know why it doesn’t work anymore. The source code is mainly the same. Therefore I have to ask...

What I want to do: Scrape the hrefs for every project on Kickstarter (https://www.kickstarter.com/discover/advanced?category_id=16&woe_id=23424750&sort=popularity&seed=2468529&page=1).
I use the “PhantomJS Cloud” and the XPath: //*[@class='relative self-start']/a. When I set the attribute as “href” I don’t get any results.

Please give me an advice what I have to set as XPath to see a list with the hrefs. Thank you


#5

Hi all,

I fixed it by my own.
Everything is fine :wink:


#6

Awesome! I've been busy the last week and tried to fix this error briefly but was unsuccessful. What was your fix?