XPathOnURL Help!

I am a novice at this stuff, but have been using DataMiner on Google to scrape web pages and want to automate that process a little more. I am having trouble getting the XPathOnURL to function correctly. I also tried using the spider tool and using the xpath functionality on there, but to no avail. As an example of what I am trying to do I would like to be able to get the Plan Names and Prices Along with the community name, size, number of bedrooms, etc. from pages like the following one:

https://www.ashtonwoods.com/austin/cantarra

Any help or a step by step on how to do so would be tremendously helpful. Again, I have a very basic knowledge of xpath, but my assumption is that I am not using the correct syntax when trying to get the formula to work.

Hi,

That site generates content with javascript and SeoTools doesn't support this (yet!). No wonder you are not finding the correct syntax :slight_smile:

You can use the Phantom JS Connector to grab the content:

This guide is very good for XPath:
https://www.w3schools.com/xml/xpath_syntax.asp

Thank you for the reply! I was driving myself crazy yesterday and just getting blank cells returned even though I was certain I had the code correct. One further question about PhantomJs Cloud. I was trying to work with it some yesterday and got it to pull the information, but I can't figure out how to get it to pull the floor plan and price at the same time. Is it's functionality limited to one thing at a time or is there something I am missing?

That is a good question! There are different approaches. For example,

  1. Extract the contents of the parent node:

Which is found here in the model:

  1. Use an "OR" statement in your XPath to grab two or more nodes:

  2. Build your own Connector and specify which fields to extract from a request. This is not easy because the documentation is lacking but you can get pretty far from copy pasting code from available connectors:
    https://github.com/nielsbosma/SeoTools-for-Excel-Connectors

Thanks again for your help on this. I understand what you did with the price and city, etc. It seems like what you are saying is that I could modify the connector to make it work for me. That is intriguing, but might be a little advanced for me at least with my current knowledge or lack thereof. Thank you for the help. Just knowing that it is a lack of functionality in the tool and not me is a big relief.

You can look at the code for InternetArchive:
https://raw.githubusercontent.com/nielsbosma/SeoTools-for-Excel-Connectors/master/InternetArchive.xml

Their service switched to javascript which made us implement PhantomJS in the background.

In your case, it should be possible with minimal changes to the code:

  1. Remove the <Text Id="Url" line
  2. Change the &targetUrl to your site
  3. Change the Xpath queries to the one I used above, with Converter="String".