Xpath On Url will not pull elements

Wapcity · February 18, 2016, 1:27pm

This is the Div Class main group that all sub groups are under.

Then I have the following 4 room loop types where I am trying to pull just the roomtypecode for each one in a cell from the website using Xpath On Url. I need the “ZZZA”,”GENR”,”DBDB”,”CITY”. in Cell A1,A2,A3, and A4 respectively. But, it will only pull the first roomtypecode for any site.

li class="guestRoomListItem clearfix" data-roomloopcount="1" data-roomtypecode="ZZZA" data-info="{"roomloopcount":"1","roomtypecode":"ZZZA"}">

li class="guestRoomListItem clearfix" data-roomloopcount="2" data-roomtypecode="GENR" data-info="{"roomloopcount":"2","roomtypecode":"GENR"}">

li class="guestRoomListItem clearfix" data-roomloopcount="3" data-roomtypecode="DBDB" data-info="{"roomloopcount":"3","roomtypecode":"DBDB"}">

li class="guestRoomListItem clearfix" data-roomloopcount="4" data-roomtypecode="CITY" data-info="{"roomloopcount":"4","roomtypecode":"CITY"}">

These are the Formula’s I am using now for each request, but it will only pull the first Room type code, even though I have it set to 1, 2, 3, and 4.

=IFERROR(XPathOnUrl($H$26,"//div[@class='guestRoomsResp']/ul/li[1]","data-roomtypecode"),"")
=IFERROR(XPathOnUrl($H$26,"//div[@class='guestRoomsResp']/ul/li[2]","data-roomtypecode"),"")
=IFERROR(XPathOnUrl($H$26,"//div[@class='guestRoomsResp']/ul/li[3]","data-roomtypecode"),"")
=IFERROR(XPathOnUrl($H$26,"//div[@class='guestRoomsResp']/ul/li[4]","data-roomtypecode"),"")

Website is:
http://www.marriott.com/hotels/hotel-rooms/nycmq-new-york-marriott-marquis/

nielsbosma · February 20, 2016, 2:28pm

Try:

XPathOnUrl($H$26,"//li[contains(@class,'guestRoomListItem')][3]","data-roomtypecode")

(not sure why your other xpath isn't working)

Wapcity · February 22, 2016, 3:28pm

Thanks Niels Your the Man!! It works, thank you!!!

Wapcity · February 23, 2016, 5:01pm

So I was able to use your example of the URL provided above, but the same example you provided will not work on a different URL with the same HTML framework.

XPathOnUrl($H$26,"//li[contains(@class,'guestRoomListItem')][3]","data-roomtypecode")

This is the other URL I am trying to scrape

http://www.marriott.com/hotels/hotel-rooms/phxap-phoenix-airport-marriott/

Please help!!! Thanks

Wapcity · February 29, 2016, 7:51pm

Good Afternoon all,

So I am officially confused lol. So when I use the XpathonUrl provided by Niels in a previous post (reference 1 below) it works, but why is it not pulling all of the room type codes? for example:

on the same URL as above (reference 2 below) I have the following HTML strings but on my excel sheet its only pulling data-roomloopcount="1","10","11","12", and "13". Even though I have cells requesting 1-15 it is skipping data-roomloopcount="2"-"9".

< li class="guestRoomListItem clearfix" data-roomloopcount="1" data-roomtypecode="ZZZA" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="2" data-roomtypecode="GENR" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="3" data-roomtypecode="DBDB" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="4" data-roomtypecode="CITY" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="5" data-roomtypecode="DCTY" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="6" data-roomtypecode="ZZZB" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="7" data-roomtypecode="DLCN" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="8" data-roomtypecode="ZZZC" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="9" data-roomtypecode="DSTE" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="10" data-roomtypecode="ZZZD" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="11" data-roomtypecode="KSTE" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="12" data-roomtypecode="EXEC" data-
< li class="guestRoomListItem clearfix" data-roomloopcount="13" data-roomtypecode="CONC" data-

My excel formula looks like =XPathOnUrl(K22,"//li[contains(@class,'guestRoomListItem')][1]","data-roomtypecode")

FYI- the weird thing is that the Xpath request is asking for number 2, but its pulling number 10:

=XPathOnUrl(K22,"//li[contains(@class,'guestRoomListItem')][2]","data-roomtypecode") = but I am getting ZZZD ("10")

I can share the excel file at your request

XPathOnUrl($H$26,"//li[contains(@class,'guestRoomListItem')][3]","data-roomtypecode")
http://www.marriott.com/hotels/hotel-rooms/nycmq-new-york-marriott-marquis/

heckler · October 3, 2017, 10:27pm

Did you ever get an answer to this? I've been seeing issues where seotools wasn't pulling the xpath element (even when other tools with the same xpath do).

chilly_bang · October 4, 2017, 1:57pm

Continuing the discussion from Xpath On Url will not pull elements:

I get the code troublefree with:

=Dump(XPathOnUrl("http://www.marriott.com/hotels/hotel-rooms/nycmq-new-york-marriott-marquis/";"//*[@id=""guest-rooms-list""]/li[2]";"data-roomtypecode";;"text"))

I'm pretty sure, the problem is placed in the kind of XPath - the simpler Xpath, the higher success chances.

heckler · October 4, 2017, 3:04pm

That works. Is there a reason that an xpath would work for a number of other tools but seotools can't? Is it a parsing thing?

I've been trying to scrape footer links on websites for some analysis I'm doing and the problem I have is with this page: http://motherandbabymatters.com/

What I care about is the image link at the bottom of the footer. Here's some xpaths I've used that work with other tools, but not SEOtools:

//div[@id='mtx_copyright']/a

/html/body/div[@id='wrapper']/div[@id='wrapper_container']/div[@id='footer_container']/div[@id='footer']/div[@id='mtx_copyright']/a

/html/body//div[@id='mtx_copyright']/a

//div[@id='footer_container']/div[@id='footer']/div[@id='mtx_copyright']/a

None of these seem to work with SEOtools.

chilly_bang · October 4, 2017, 3:48pm

With the cited site you haven't much tries to scrape;) I was able to scrape the footer once, on the next try - no longer, because the page uses a security heuristics by www.sitelock.com and redirects suspicial user to captcha secured screen like http://easycaptures.com/3961300365

heckler · October 4, 2017, 3:58pm

Yeah that's probably because I've been blasting it trying to find an xpath that works.

Still no luck.

//div[@id='mtx_copyright']/a[last()]

@chilly_bang - I'm aware of the sitelock, you have to wait between requests; which just makes it more frustrating