XPathOnURL Weather Data

I am new to SeoTools. Using the link: https://www.theweathernetwork.com/ca/hourly-weather-forecast/british-columbia/prince-george I would like to get data from the table marked "Hourly" placed in individual cells in excel.

The XPath for one of the elements of the table is:
id('hourly-weather-forecast')/x:div[1]/x:div[4]

From reading other comments my understanding is that the format should be converted to something like this:
"//*[@id='hourly-weather-forecast']/div[1]/div[4]"

Giving the equation:
=XPathOnUrl("https://www.theweathernetwork.com/ca/hourly-weather-forecast/british-columbia/prince-george","//*[@id='hourly-weather-forecast']/div[1]/div[4]")

however, I only seem able to get a blank space.

Any help on this would be appreciated!

Thanks!

Hi,

Unfortunately, the contents of that table are generated with javascript, which means regular XPath won't do the trick. You can use the Phantom JS Cloud Connector to solve this (required sign-up and a personal api-key) :

=Dump(Connector("PhantomJsCloud.XPath","https://www.theweathernetwork.com/ca/hourly-weather-forecast/british-columbia/prince-george","//*[@id='hourly-weather-forecast']/div[1]/div[2]",,TRUE,"us"))

1 Like

Thanks! This was working perfectly for awhile. Now when I reload the formula the error I get is "No nodes found for XPath '//*[@id='hourly-weather-forecast']/div[1]/div[2]" It's the exact same formula that was working before. I am so confused.

Perhaps the site admins have shuffled things around?

The other question I have is I am filling a large spreadsheet where each cell corresponds to a different 'node' on the website, with the formula differing only slightly for example:

Cell D2: =Dump(Connector("PhantomJsCloud.XPath","https://spotwx.com/products/grib_index.php?model=gem_reg_10km&lat=53.91706&lon=-122.74967&tz=America/Vancouver&display=table","//[@id='example']/tbody/tr[3]/td[2]",,FALSE,"us"))
Cell D3:=Dump(Connector("PhantomJsCloud.XPath","https://spotwx.com/products/grib_index.php?model=gem_reg_10km&lat=53.91706&lon=-122.74967&tz=America/Vancouver&display=table","//
[@id='example']/tbody/tr[4]/td[2]",,FALSE,"us"))

As you can see, the only difference is in Cell D2 there is a tr[3] and in Cell D3 there is a tr[4].

Is there some way to reference the row # in the formula so that I do not need to change the formula for every single cell?

Thanks mucho!!

Thanks for the reply yesterday diskborste, I restarted everything and it seems to be working today. Still not sure what the issue was yesterday.

Hi,

You can use a [*] to dump all the available values instead of the specific node. Alternatively, you can reference cells which contains the values: ["&A1&"]

Thanks diskborste! That works awesomely!

Still having issues however, the worksheet seems to work sometimes and not others. I am wondering if there is a maximum amount of times you can use SeoTools/PhantomJS to pull data in one day or at a time as usually after I have added the formula to about 100 cells or so it stops working.

PhantomJS might detect high activity from your IP and cancel your requests. Perhaps use a proxy?

how would I do that?

Thanks in advance!

You can add proxies under "Global HTTP Settings"

Can you try to scrape some other webpage after errors are returned? It might be the weather site blocking PhantomJS requests.

I get the same issue with the other sites I've tested it on so I don't think it's an issue on the website side.

Where do I find the Global HTTP Settings page?

Okay, thanks...so what exactly am I typing in as an "address"?

The proxy information. You can purchase proxies from different sites, for example proxy-hub.com