XPathOnUrl Problem


#1

Can anyone tell me how can adapt the query below make it case insensitive.

Example - affordable, AFFORDABLE

(currently it only counts all the h2 tags that contain the exact word Affordable)

=XPathOnUrl(A2,"count(//h2[contains(text(),'Affordable')])")

I thought I might be an idea to make it lowercase first but I can't work how to do that.

I'm a bit stuck and there might be a better way to go about it.

Thanks for your time.


#2

I feel like there might be an easier way to get what you need.

Do you have an example URL to check out?


#3

Sure can

https://www.brushmarks.co.uk/affordable-toothbrush/


#4

Should return

2 Affordable
3 affordable
2 AFFORDABLE

Affordable Toothbrush
Affordable Electric Toothbrush

affordable Toothbrush
affordable Electric Toothbrush
Toothbrush that’s affordable

AFFORDABLE Toothbrush
Toothbrush that’s AFFORDABLE


#5

Ok. So you're only trying to retrieve the h2 titles? Because according to the link, there are a total off 3 "Affordable" If you include the h3 header.

I'm not sure if you only need to return the word "affordable" no matter what case it's in, or if you need to extract all the headers that include the word "affordable".


Easiest way in my opinion: CsQueryOnUrl

If you simply want all the h2 headers, CsQueryOnUrl will make this easy. For this, with A2 as your URL, just use this to return all the headers..

=Dump(CsQueryOnUrl(A2,"div[class=entry-content] h2",,,"text"))

To explain, all the H2 headers you are looking for are within the div element that contains the class attribute of entry-content. Therefore, seprating the h2 with a space on the div container according to the CSS SELECTOR syntax, will return all h2 within that specified div.

The Dump befofore the entire CsQueryOnUrl simply takes all those possible results, or an array, and literally "Dumps" them into the rows below. If you want the Dump to span across the columns, or horizontally, you can use the boolean value for the transpose variable in the function like so:

=Dump(CsQueryOnUrl(A2,"div[class=entry-content] h2",,,"text"),TRUE)

Notice the TRUE factor at the end of the function. This will "Transpose" your data in the other direction so to speak.


Now here it is again using: XPathOnUrl

=Dump(XPathOnUrl(A2,"//div[@class='entry-content']/h2",""))

Here's what it looks like in Excel with the results from both functions side-by-side.

preview-xpath-vs-csquery


Let me know if this isn't what you're looking for. But that's basically what I understood as the data you're looking for.

ADDITION TO POST:

Here's a simple guide for CSS SELECTOR elements and how they work on the pages:
w3schools.com - CSS Selectors
Definitely try their tester too, you can click on specific elements and it will highlight them in action as you click. It's on that same link, but here's the direct link to their little web app:
w3schools.com - CSS SELECT it Yourself

Also, here's one for XPath:
w3schools.com - XPath Syntax
XPath is sometimes needed for cases where css syntax will not achieve the desired output. The good thing about XPath is that it has those additional features where you CAN search by contains(text() as you originally had planned.


#6

Many thanks for helping me out.

I am only trying to count the h2 tags that contain the word Affordable and the possible variations eg AFFORDABLE, affordable, affordaBLE etc.

The URL provided was only an example of a page with different variations. I'm looking to scape the h2 tags for a list of websites (A2, A3, A4 etc) not a specific site.

I'm using the following because it gets a count of all the h2's on any site.

=XPathOnUrl(A2,"count(//h2[contains(text(),'Affordable')])")

What the above misses is the count of h2 tags that contain AFFORDABLE, affordable, afforDAble etc.

I don't need the whole text of the h2 tags, I just want to know how many contain a specific word (regardless of case).

Hope this makes sense.


#7

Ah. I underestimated you. I had a hard time figuring this out. But It worked.

It's ugly...

=Dump(XPathOnUrl(A2,"//h2[text()[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),'affordable')]]"))

This above code returns a Dump of all the h2 text that contains the text "affordable" no matter the case. So from this, we can come up with a count now. We'll simply place the count like you originally had it and get rid of the dump...

=XPathOnUrl(A2,"count(//h2[text()[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),'affordable')]])")

image


Old posts on this Forum with Broken Image Links
#8

Wow, thank you so much.

I had just been trying the translate thing, but couldn't get the syntax correct.

Many thanks once again. I can go to bed now, it's 1:30 in the morning here.


#9

No problem. I think it took me roughly 3 hours, while working on other projects. It's funny though, when I tried to figure out the syntax myself a few months ago I couldn't do it. And I just gave up.

But anyway, in-case anyone needs to filter the dumped array that we have- that is, to only show the parameter in question, which in this case is "affordable"- we can run a RegexpFind (\baffordable\b) between the Dump & XPath like so:

=Dump(RegexpFind(XPathOnUrl(A2,"//h2[text()[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),'affordable')]]"),"\baffordable\b"))

Dump

RegexpFind

XPathOnUrl
image

Now we Have our Dump of the "affordable" keyword in its original format! :thinking:

Just to keep this thread goin, there's a lot of interesting functions SeoTools has, especially when combined with Excel.

So now let's run this excel built-in PROPER case function on the data we have (Excel 2007 and up):

We'll run this in a separate range. You can however add-on top of the existing code and ideally end up with one single column, but imo I think it's always good to keep copies of the data you're working on. Excel can be nightmarish if you aren't prepared.

=PROPER()

image

Edit: For some reason, PROPER didn't modify the "S" in "That'S""