Scraping the Cache version from Google

Hi,

I have been trying to scrap the cache version of a list of URLs (only 20) using the xpath but no luck so far.

The list indicates this pattern: http://webcache.googleusercontent.com/search?q=cache:https://www.dkc.ae/ whereas the xpath i use is this: /html/body/div[1]/div[1]/span[1] to locate the cache version of the page.

The full function is: =XPathOnUrl("http://webcache.googleusercontent.com/search?q=cache:https://www.dkc.ae/";"/html/body/div[1]/div[1]/span[1]")

Is there anything i am missing or doing wrong? It just returns me blank cells.

Thanks,

Can you tell me where the cache version is available on that page?

If you open this URL in the browser: http://webcache.googleusercontent.com/search?q=cache:https://www.dkc.ae/

I'd like to scrap the following:

I found out the xpath for that which is: /html/body/div[1]/div[1]/span[1] but run xpath throws me blank cells.

Aha, didn't see it because the text was hidden behind the banner. This XPath works for me:

=XPathOnUrl("http://webcache.googleusercontent.com/search?q=cache:https://www.dkc.ae/";"//div[1]/span[2]")

However, this service is the same as the one used in the GoogleCachDate connector so I'm not sure why you prefer to scrape this manually?

Thanks for your help.

I thought, this should =Connector("GoogleCacheDate.GoogleCacheDate";C1) only work for date. Please suggest if i can pull the cached URL too through the above function?

Thanks again!

Hmm, isn't the URL always set to the latest URL? Is the purpose to identify redirects?

The purpose is to identify the pages google has indexed that don't match with original URL pattern. Basically, a mismatch of indexed vs cached URL but xpath helps me to do that so no worries. :slight_smile: