Help with scraping


#1

How can I scrape this fields seperately ?

https://www.confianzaonline.es/empresas/decantia.htm

CIF
Datos de contacto
Sitio/s web


#2

Her are some examples of extracting for first three parts:

=XPathOnUrl(A1;"//li[@class='verified']")
=XPathOnUrl(A1;"//ul/li[3]/span[2]")
=XPathOnUrl(A1;"//ul/li[4]/span[2]")

The easiest way is to inspect the elements (in your web browser, right click on the part you are interested in, chose "Inspect", then right click on in the right area and chose "Copy Xpath". Then you can use this string in your XPathOnUrl formula.


#3

Thanks a lot ! On similar sites I get in Excel the error: endofstream exception.
Can you tell me what this means ?


#4

Not really, can you give me an example of a site where you get that error and the formula you're using?


#5

Here the screenshot:


#6

It is easier if you post the URL and the part you wish to scrape :slight_smile:


#7

OK. Only wanted to show the error message.

here we go:
https://www.ekomi.es/testimonios-wacomestore-es.html
Dirección
Website
Representante autorizado


#8

How about these:

Direccion:
=XPathOnUrl(A1;"//p[@class='shopAddressDetails']")

Website:
=XPathOnUrl(A1;"//a[@class='shoplink url']")

Representante
=XPathOnUrl(A1;"//ul/li[5]/div[2]")

image


#9

Strange. I did copy & paste and it does not work.
captura20captura21


#10

Perhaps you're using other separators in your system. Can you try and replace the semi-colon with comma?

How about the other xpaths I suggested?


#11

Like this ?
None of the xpaths work captura22


#12

The forum messed up @diskborste's formulas (it replaced single quote characters with different characters). Try these:

Direccion:
=XPathOnUrl(A1;"//p[@class='shopAddressDetails']")

Website:
=XPathOnUrl(A1;"//a[@class='shoplink url']")

Representante
=XPathOnUrl(A1;"//ul/li[5]/div[2]")

#13

Great stuff! Thanks !